Overview

Dataset statistics

Number of variables36
Number of observations413412
Missing cells2449654
Missing cells (%)16.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory113.5 MiB
Average record size in memory288.0 B

Variable types

Numeric11
Categorical25

Warnings

CMPLNT_FR_DT has a high cardinality: 1792 distinct values High cardinality
CMPLNT_FR_TM has a high cardinality: 1440 distinct values High cardinality
CMPLNT_TO_DT has a high cardinality: 1264 distinct values High cardinality
CMPLNT_TO_TM has a high cardinality: 1440 distinct values High cardinality
OFNS_DESC has a high cardinality: 59 distinct values High cardinality
PARKS_NM has a high cardinality: 508 distinct values High cardinality
PD_DESC has a high cardinality: 336 distinct values High cardinality
PREM_TYP_DESC has a high cardinality: 74 distinct values High cardinality
RPT_DT has a high cardinality: 366 distinct values High cardinality
STATION_NAME has a high cardinality: 362 distinct values High cardinality
Lat_Lon has a high cardinality: 67403 distinct values High cardinality
New Georeferenced Column has a high cardinality: 67403 distinct values High cardinality
X_COORD_CD is highly correlated with LongitudeHigh correlation
Y_COORD_CD is highly correlated with LatitudeHigh correlation
Latitude is highly correlated with Y_COORD_CDHigh correlation
Longitude is highly correlated with X_COORD_CDHigh correlation
LAW_CAT_CD is highly correlated with OFNS_DESCHigh correlation
BORO_NM is highly correlated with HADEVELOPT and 1 other fieldsHigh correlation
HADEVELOPT is highly correlated with BORO_NM and 1 other fieldsHigh correlation
OFNS_DESC is highly correlated with LAW_CAT_CDHigh correlation
PATROL_BORO is highly correlated with BORO_NM and 1 other fieldsHigh correlation
CMPLNT_TO_DT has 39104 (9.5%) missing values Missing
CMPLNT_TO_TM has 38979 (9.4%) missing values Missing
HADEVELOPT has 411842 (99.6%) missing values Missing
HOUSING_PSA has 382445 (92.5%) missing values Missing
LOC_OF_OCCUR_DESC has 66086 (16.0%) missing values Missing
PARKS_NM has 410736 (99.4%) missing values Missing
STATION_NAME has 406179 (98.3%) missing values Missing
SUSP_AGE_GROUP has 94862 (22.9%) missing values Missing
SUSP_RACE has 94862 (22.9%) missing values Missing
SUSP_SEX has 94862 (22.9%) missing values Missing
TRANSIT_DISTRICT has 406179 (98.3%) missing values Missing
CMPLNT_NUM has unique values Unique
JURISDICTION_CODE has 371665 (89.9%) zeros Zeros

Reproduction

Analysis started2021-03-06 21:34:46.900655
Analysis finished2021-03-06 21:39:34.857821
Duration4 minutes and 47.96 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

CMPLNT_NUM
Real number (ℝ≥0)

UNIQUE

Distinct413412
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean549358625.6
Minimum100001361
Maximum999998911
Zeros0
Zeros (%)0.0%
Memory size3.2 MiB
2021-03-06T16:39:35.258795image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum100001361
5-th percentile144865971.2
Q1323777360.8
median549621616
Q3774108179.5
95-th percentile954860935.4
Maximum999998911
Range899997550
Interquartile range (IQR)450330818.8

Descriptive statistics

Standard deviation259732435.9
Coefficient of variation (CV)0.4727921321
Kurtosis-1.199660895
Mean549358625.6
Median Absolute Deviation (MAD)225219565
Skewness0.002008709704
Sum2.271114481 × 1014
Variance6.746093825 × 1016
MonotocityNot monotonic
2021-03-06T16:39:35.428328image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6354411501
 
< 0.1%
4218271351
 
< 0.1%
5885281701
 
< 0.1%
6524892591
 
< 0.1%
3074975161
 
< 0.1%
4837156321
 
< 0.1%
3620787691
 
< 0.1%
2530207241
 
< 0.1%
2771277371
 
< 0.1%
5571016271
 
< 0.1%
Other values (413402)413402
> 99.9%
ValueCountFrequency (%)
1000013611
< 0.1%
1000034921
< 0.1%
1000056511
< 0.1%
1000082651
< 0.1%
1000101961
< 0.1%
ValueCountFrequency (%)
9999989111
< 0.1%
9999970841
< 0.1%
9999936541
< 0.1%
9999933501
< 0.1%
9999871101
< 0.1%

ADDR_PCT_CD
Real number (ℝ≥0)

Distinct77
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean63.9899761
Minimum1
Maximum123
Zeros0
Zeros (%)0.0%
Memory size3.2 MiB
2021-03-06T16:39:35.606708image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile9
Q140
median66
Q3101
95-th percentile115
Maximum123
Range122
Interquartile range (IQR)61

Descriptive statistics

Standard deviation34.47196848
Coefficient of variation (CV)0.5387088819
Kurtosis-1.155445096
Mean63.9899761
Median Absolute Deviation (MAD)28
Skewness0.0356859667
Sum26454224
Variance1188.316611
MonotocityNot monotonic
2021-03-06T16:39:35.787994image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7512814
 
3.1%
4010630
 
2.6%
449579
 
2.3%
439369
 
2.3%
478966
 
2.2%
1148850
 
2.1%
468775
 
2.1%
528288
 
2.0%
427860
 
1.9%
737689
 
1.9%
Other values (67)320592
77.5%
ValueCountFrequency (%)
14960
1.2%
53320
0.8%
64613
1.1%
73860
0.9%
94222
1.0%
ValueCountFrequency (%)
1232024
 
0.5%
1224235
1.0%
1214899
1.2%
1205845
1.4%
1157197
1.7%

BORO_NM
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing485
Missing (%)0.1%
Memory size3.2 MiB
BROOKLYN
119208 
MANHATTAN
97365 
BRONX
90446 
QUEENS
88922 
STATEN ISLAND
16986 

Length

Max length13
Median length8
Mean length7.353670261
Min length5

Characters and Unicode

Total characters3036529
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBRONX
2nd rowBRONX
3rd rowBRONX
4th rowBRONX
5th rowQUEENS
ValueCountFrequency (%)
BROOKLYN119208
28.8%
MANHATTAN97365
23.6%
BRONX90446
21.9%
QUEENS88922
21.5%
STATEN ISLAND16986
 
4.1%
(Missing)485
 
0.1%
2021-03-06T16:39:36.122878image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-06T16:39:36.252929image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
brooklyn119208
27.7%
manhattan97365
22.6%
bronx90446
21.0%
queens88922
20.7%
staten16986
 
4.0%
island16986
 
4.0%

Most occurring characters

ValueCountFrequency (%)
N527278
17.4%
O328862
10.8%
A326067
10.7%
T228702
 
7.5%
B209654
 
6.9%
R209654
 
6.9%
E194830
 
6.4%
L136194
 
4.5%
S122894
 
4.0%
K119208
 
3.9%
Other values (9)633186
20.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3019543
99.4%
Space Separator16986
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
N527278
17.5%
O328862
10.9%
A326067
10.8%
T228702
 
7.6%
B209654
 
6.9%
R209654
 
6.9%
E194830
 
6.5%
L136194
 
4.5%
S122894
 
4.1%
K119208
 
3.9%
Other values (8)616200
20.4%
ValueCountFrequency (%)
16986
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3019543
99.4%
Common16986
 
0.6%

Most frequent character per script

ValueCountFrequency (%)
N527278
17.5%
O328862
10.9%
A326067
10.8%
T228702
 
7.6%
B209654
 
6.9%
R209654
 
6.9%
E194830
 
6.5%
L136194
 
4.5%
S122894
 
4.1%
K119208
 
3.9%
Other values (8)616200
20.4%
ValueCountFrequency (%)
16986
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3036529
100.0%

Most frequent character per block

ValueCountFrequency (%)
N527278
17.4%
O328862
10.8%
A326067
10.7%
T228702
 
7.5%
B209654
 
6.9%
R209654
 
6.9%
E194830
 
6.4%
L136194
 
4.5%
S122894
 
4.0%
K119208
 
3.9%
Other values (9)633186
20.9%

CMPLNT_FR_DT
Categorical

HIGH CARDINALITY

Distinct1792
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
06/01/2020
 
1857
01/01/2020
 
1733
01/15/2020
 
1439
06/02/2020
 
1392
08/01/2020
 
1388
Other values (1787)
405603 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters4134120
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique797 ?
Unique (%)0.2%

Sample

1st row12/23/2020
2nd row12/21/2020
3rd row11/22/2020
4th row11/22/2020
5th row11/21/2020
ValueCountFrequency (%)
06/01/20201857
 
0.4%
01/01/20201733
 
0.4%
01/15/20201439
 
0.3%
06/02/20201392
 
0.3%
08/01/20201388
 
0.3%
01/31/20201377
 
0.3%
01/14/20201371
 
0.3%
03/13/20201366
 
0.3%
08/14/20201365
 
0.3%
10/23/20201360
 
0.3%
Other values (1782)398764
96.5%
2021-03-06T16:39:36.756360image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
06/01/20201857
 
0.4%
01/01/20201733
 
0.4%
01/15/20201439
 
0.3%
06/02/20201392
 
0.3%
08/01/20201388
 
0.3%
01/31/20201377
 
0.3%
01/14/20201371
 
0.3%
03/13/20201366
 
0.3%
08/14/20201365
 
0.3%
10/23/20201360
 
0.3%
Other values (1782)398764
96.5%

Most occurring characters

ValueCountFrequency (%)
01328012
32.1%
21061817
25.7%
/826824
20.0%
1376303
 
9.1%
393187
 
2.3%
982130
 
2.0%
878604
 
1.9%
776355
 
1.8%
573065
 
1.8%
672235
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3307296
80.0%
Other Punctuation826824
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
01328012
40.2%
21061817
32.1%
1376303
 
11.4%
393187
 
2.8%
982130
 
2.5%
878604
 
2.4%
776355
 
2.3%
573065
 
2.2%
672235
 
2.2%
465588
 
2.0%
ValueCountFrequency (%)
/826824
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4134120
100.0%

Most frequent character per script

ValueCountFrequency (%)
01328012
32.1%
21061817
25.7%
/826824
20.0%
1376303
 
9.1%
393187
 
2.3%
982130
 
2.0%
878604
 
1.9%
776355
 
1.8%
573065
 
1.8%
672235
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII4134120
100.0%

Most frequent character per block

ValueCountFrequency (%)
01328012
32.1%
21061817
25.7%
/826824
20.0%
1376303
 
9.1%
393187
 
2.3%
982130
 
2.0%
878604
 
1.9%
776355
 
1.8%
573065
 
1.8%
672235
 
1.7%

CMPLNT_FR_TM
Categorical

HIGH CARDINALITY

Distinct1440
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
12:00:00
 
10567
15:00:00
 
8696
18:00:00
 
8654
17:00:00
 
8405
20:00:00
 
8221
Other values (1435)
368869 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters3307296
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row19:50:00
2nd row01:10:00
3rd row22:00:00
4th row09:50:00
5th row15:38:00
ValueCountFrequency (%)
12:00:0010567
 
2.6%
15:00:008696
 
2.1%
18:00:008654
 
2.1%
17:00:008405
 
2.0%
20:00:008221
 
2.0%
16:00:007981
 
1.9%
19:00:007766
 
1.9%
14:00:006986
 
1.7%
21:00:006739
 
1.6%
22:00:006720
 
1.6%
Other values (1430)332677
80.5%
2021-03-06T16:39:37.294959image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
12:00:0010567
 
2.6%
15:00:008696
 
2.1%
18:00:008654
 
2.1%
17:00:008405
 
2.0%
20:00:008221
 
2.0%
16:00:007981
 
1.9%
19:00:007766
 
1.9%
14:00:006986
 
1.7%
21:00:006739
 
1.6%
22:00:006720
 
1.6%
Other values (1430)332677
80.5%

Most occurring characters

ValueCountFrequency (%)
01428155
43.2%
:826824
25.0%
1329855
 
10.0%
2182711
 
5.5%
5139785
 
4.2%
3133412
 
4.0%
486310
 
2.6%
849910
 
1.5%
947333
 
1.4%
743263
 
1.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2480472
75.0%
Other Punctuation826824
 
25.0%

Most frequent character per category

ValueCountFrequency (%)
01428155
57.6%
1329855
 
13.3%
2182711
 
7.4%
5139785
 
5.6%
3133412
 
5.4%
486310
 
3.5%
849910
 
2.0%
947333
 
1.9%
743263
 
1.7%
639738
 
1.6%
ValueCountFrequency (%)
:826824
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3307296
100.0%

Most frequent character per script

ValueCountFrequency (%)
01428155
43.2%
:826824
25.0%
1329855
 
10.0%
2182711
 
5.5%
5139785
 
4.2%
3133412
 
4.0%
486310
 
2.6%
849910
 
1.5%
947333
 
1.4%
743263
 
1.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII3307296
100.0%

Most frequent character per block

ValueCountFrequency (%)
01428155
43.2%
:826824
25.0%
1329855
 
10.0%
2182711
 
5.5%
5139785
 
4.2%
3133412
 
4.0%
486310
 
2.6%
849910
 
1.5%
947333
 
1.4%
743263
 
1.3%

CMPLNT_TO_DT
Categorical

HIGH CARDINALITY
MISSING

Distinct1264
Distinct (%)0.3%
Missing39104
Missing (%)9.5%
Memory size3.2 MiB
06/01/2020
 
1461
06/02/2020
 
1425
10/21/2020
 
1272
01/15/2020
 
1271
01/14/2020
 
1239
Other values (1259)
367640 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters3743080
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique517 ?
Unique (%)0.1%

Sample

1st row06/01/2020
2nd row12/29/2020
3rd row12/23/2020
4th row12/31/2020
5th row12/22/2020
ValueCountFrequency (%)
06/01/20201461
 
0.4%
06/02/20201425
 
0.3%
10/21/20201272
 
0.3%
01/15/20201271
 
0.3%
01/14/20201239
 
0.3%
01/31/20201238
 
0.3%
01/01/20201230
 
0.3%
02/03/20201227
 
0.3%
09/08/20201223
 
0.3%
10/06/20201212
 
0.3%
Other values (1254)361510
87.4%
(Missing)39104
 
9.5%
2021-03-06T16:39:37.681490image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
06/01/20201461
 
0.4%
06/02/20201425
 
0.4%
10/21/20201272
 
0.3%
01/15/20201271
 
0.3%
01/14/20201239
 
0.3%
01/31/20201238
 
0.3%
01/01/20201230
 
0.3%
02/03/20201227
 
0.3%
09/08/20201223
 
0.3%
10/06/20201212
 
0.3%
Other values (1254)361510
96.6%

Most occurring characters

ValueCountFrequency (%)
01202748
32.1%
2966576
25.8%
/748616
20.0%
1336845
 
9.0%
384175
 
2.2%
972337
 
1.9%
871191
 
1.9%
769351
 
1.9%
566162
 
1.8%
665795
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2994464
80.0%
Other Punctuation748616
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
01202748
40.2%
2966576
32.3%
1336845
 
11.2%
384175
 
2.8%
972337
 
2.4%
871191
 
2.4%
769351
 
2.3%
566162
 
2.2%
665795
 
2.2%
459284
 
2.0%
ValueCountFrequency (%)
/748616
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3743080
100.0%

Most frequent character per script

ValueCountFrequency (%)
01202748
32.1%
2966576
25.8%
/748616
20.0%
1336845
 
9.0%
384175
 
2.2%
972337
 
1.9%
871191
 
1.9%
769351
 
1.9%
566162
 
1.8%
665795
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII3743080
100.0%

Most frequent character per block

ValueCountFrequency (%)
01202748
32.1%
2966576
25.8%
/748616
20.0%
1336845
 
9.0%
384175
 
2.2%
972337
 
1.9%
871191
 
1.9%
769351
 
1.9%
566162
 
1.8%
665795
 
1.8%

CMPLNT_TO_TM
Categorical

HIGH CARDINALITY
MISSING

Distinct1440
Distinct (%)0.4%
Missing38979
Missing (%)9.4%
Memory size3.2 MiB
12:00:00
 
6091
15:00:00
 
5491
10:00:00
 
4977
08:00:00
 
4876
17:00:00
 
4830
Other values (1435)
348168 

Length

Max length8
Median length8
Mean length8
Min length8

Characters and Unicode

Total characters2995464
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row13:30:00
2nd row22:12:00
3rd row07:34:00
4th row09:00:00
5th row12:22:00
ValueCountFrequency (%)
12:00:006091
 
1.5%
15:00:005491
 
1.3%
10:00:004977
 
1.2%
08:00:004876
 
1.2%
17:00:004830
 
1.2%
16:00:004769
 
1.2%
09:00:004765
 
1.2%
14:00:004522
 
1.1%
13:00:004477
 
1.1%
18:00:004440
 
1.1%
Other values (1430)325195
78.7%
(Missing)38979
 
9.4%
2021-03-06T16:39:38.069625image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
12:00:006091
 
1.6%
15:00:005491
 
1.5%
10:00:004977
 
1.3%
08:00:004876
 
1.3%
17:00:004830
 
1.3%
16:00:004769
 
1.3%
09:00:004765
 
1.3%
14:00:004522
 
1.2%
13:00:004477
 
1.2%
18:00:004440
 
1.2%
Other values (1430)325195
86.8%

Most occurring characters

ValueCountFrequency (%)
01215010
40.6%
:748866
25.0%
1309012
 
10.3%
2164481
 
5.5%
5155010
 
5.2%
3130332
 
4.4%
487381
 
2.9%
849409
 
1.6%
947345
 
1.6%
745773
 
1.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2246598
75.0%
Other Punctuation748866
 
25.0%

Most frequent character per category

ValueCountFrequency (%)
01215010
54.1%
1309012
 
13.8%
2164481
 
7.3%
5155010
 
6.9%
3130332
 
5.8%
487381
 
3.9%
849409
 
2.2%
947345
 
2.1%
745773
 
2.0%
642845
 
1.9%
ValueCountFrequency (%)
:748866
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2995464
100.0%

Most frequent character per script

ValueCountFrequency (%)
01215010
40.6%
:748866
25.0%
1309012
 
10.3%
2164481
 
5.5%
5155010
 
5.2%
3130332
 
4.4%
487381
 
2.9%
849409
 
1.6%
947345
 
1.6%
745773
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII2995464
100.0%

Most frequent character per block

ValueCountFrequency (%)
01215010
40.6%
:748866
25.0%
1309012
 
10.3%
2164481
 
5.5%
5155010
 
5.2%
3130332
 
4.4%
487381
 
2.9%
849409
 
1.6%
947345
 
1.6%
745773
 
1.5%

CRM_ATPT_CPTD_CD
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
COMPLETED
406904 
ATTEMPTED
 
6508

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters3720708
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCOMPLETED
2nd rowCOMPLETED
3rd rowCOMPLETED
4th rowCOMPLETED
5th rowCOMPLETED
ValueCountFrequency (%)
COMPLETED406904
98.4%
ATTEMPTED6508
 
1.6%
2021-03-06T16:39:38.456842image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-06T16:39:38.583505image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
completed406904
98.4%
attempted6508
 
1.6%

Most occurring characters

ValueCountFrequency (%)
E826824
22.2%
T426428
11.5%
M413412
11.1%
P413412
11.1%
D413412
11.1%
C406904
10.9%
O406904
10.9%
L406904
10.9%
A6508
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3720708
100.0%

Most frequent character per category

ValueCountFrequency (%)
E826824
22.2%
T426428
11.5%
M413412
11.1%
P413412
11.1%
D413412
11.1%
C406904
10.9%
O406904
10.9%
L406904
10.9%
A6508
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Latin3720708
100.0%

Most frequent character per script

ValueCountFrequency (%)
E826824
22.2%
T426428
11.5%
M413412
11.1%
P413412
11.1%
D413412
11.1%
C406904
10.9%
O406904
10.9%
L406904
10.9%
A6508
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII3720708
100.0%

Most frequent character per block

ValueCountFrequency (%)
E826824
22.2%
T426428
11.5%
M413412
11.1%
P413412
11.1%
D413412
11.1%
C406904
10.9%
O406904
10.9%
L406904
10.9%
A6508
 
0.2%

HADEVELOPT
Categorical

HIGH CORRELATION
MISSING

Distinct25
Distinct (%)1.6%
Missing411842
Missing (%)99.6%
Memory size3.2 MiB
INGERSOLL
253 
WALD
167 
MANHATTANVILLE
113 
GRANT
107 
WHITMAN
102 
Other values (20)
828 

Length

Max length31
Median length8
Mean length8.817834395
Min length4

Characters and Unicode

Total characters13844
Distinct characters27
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st rowWHITMAN
2nd rowNOSTRAND
3rd rowWILLIAMSBURG
4th rowWALD
5th rowRIIS
ValueCountFrequency (%)
INGERSOLL253
 
0.1%
WALD167
 
< 0.1%
MANHATTANVILLE113
 
< 0.1%
GRANT107
 
< 0.1%
WHITMAN102
 
< 0.1%
WILLIAMSBURG98
 
< 0.1%
NOSTRAND96
 
< 0.1%
MARBLE HILL92
 
< 0.1%
RIIS81
 
< 0.1%
SHEEPSHEAD BAY76
 
< 0.1%
Other values (15)385
 
0.1%
(Missing)411842
99.6%
2021-03-06T16:39:39.162956image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ingersoll253
 
12.6%
wald167
 
8.3%
riis124
 
6.2%
manhattanville113
 
5.6%
grant107
 
5.3%
whitman102
 
5.1%
williamsburg98
 
4.9%
nostrand96
 
4.8%
hill92
 
4.6%
marble92
 
4.6%
Other values (25)771
38.3%

Most occurring characters

ValueCountFrequency (%)
L1497
10.8%
A1367
 
9.9%
I1291
 
9.3%
R1061
 
7.7%
S1004
 
7.3%
E991
 
7.2%
N945
 
6.8%
O841
 
6.1%
T688
 
5.0%
H617
 
4.5%
Other values (17)3542
25.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter13366
96.5%
Space Separator445
 
3.2%
Open Punctuation11
 
0.1%
Decimal Number11
 
0.1%
Close Punctuation11
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
L1497
11.2%
A1367
10.2%
I1291
9.7%
R1061
 
7.9%
S1004
 
7.5%
E991
 
7.4%
N945
 
7.1%
O841
 
6.3%
T688
 
5.1%
H617
 
4.6%
Other values (13)3064
22.9%
ValueCountFrequency (%)
445
100.0%
ValueCountFrequency (%)
(11
100.0%
ValueCountFrequency (%)
511
100.0%
ValueCountFrequency (%)
)11
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin13366
96.5%
Common478
 
3.5%

Most frequent character per script

ValueCountFrequency (%)
L1497
11.2%
A1367
10.2%
I1291
9.7%
R1061
 
7.9%
S1004
 
7.5%
E991
 
7.4%
N945
 
7.1%
O841
 
6.3%
T688
 
5.1%
H617
 
4.6%
Other values (13)3064
22.9%
ValueCountFrequency (%)
445
93.1%
(11
 
2.3%
511
 
2.3%
)11
 
2.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII13844
100.0%

Most frequent character per block

ValueCountFrequency (%)
L1497
10.8%
A1367
 
9.9%
I1291
 
9.3%
R1061
 
7.7%
S1004
 
7.3%
E991
 
7.2%
N945
 
6.8%
O841
 
6.1%
T688
 
5.0%
H617
 
4.5%
Other values (17)3542
25.6%

HOUSING_PSA
Real number (ℝ≥0)

MISSING

Distinct339
Distinct (%)1.1%
Missing382445
Missing (%)92.5%
Infinite0
Infinite (%)0.0%
Mean7194.91407
Minimum218
Maximum71750
Zeros0
Zeros (%)0.0%
Memory size3.2 MiB
2021-03-06T16:39:39.433231image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum218
5-th percentile258
Q1489
median720
Q31269
95-th percentile44066
Maximum71750
Range71532
Interquartile range (IQR)780

Descriptive statistics

Standard deviation14347.62114
Coefficient of variation (CV)1.994133774
Kurtosis2.563301578
Mean7194.91407
Median Absolute Deviation (MAD)267
Skewness2.009575782
Sum222804904
Variance205854232.5
MonotocityNot monotonic
2021-03-06T16:39:39.624719image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
670474
 
0.1%
887411
 
0.1%
720405
 
0.1%
1233368
 
0.1%
544364
 
0.1%
4552363
 
0.1%
1251351
 
0.1%
609345
 
0.1%
590342
 
0.1%
527328
 
0.1%
Other values (329)27216
 
6.6%
(Missing)382445
92.5%
ValueCountFrequency (%)
218259
0.1%
227301
0.1%
235108
 
< 0.1%
23849
 
< 0.1%
24030
 
< 0.1%
ValueCountFrequency (%)
717503
 
< 0.1%
706791
 
< 0.1%
668712
 
< 0.1%
66563124
< 0.1%
644435
 
< 0.1%

JURISDICTION_CODE
Real number (ℝ≥0)

ZEROS

Distinct18
Distinct (%)< 0.1%
Missing463
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean0.643767148
Minimum0
Maximum97
Zeros371665
Zeros (%)89.9%
Memory size3.2 MiB
2021-03-06T16:39:39.794999image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum97
Range97
Interquartile range (IQR)0

Descriptive statistics

Standard deviation6.574193338
Coefficient of variation (CV)10.21206714
Kurtosis198.1779824
Mean0.643767148
Median Absolute Deviation (MAD)0
Skewness14.04958619
Sum265843
Variance43.22001804
MonotocityNot monotonic
2021-03-06T16:39:39.926889image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
0371665
89.9%
230589
 
7.4%
17217
 
1.7%
971550
 
0.4%
31040
 
0.3%
88251
 
0.1%
72213
 
0.1%
14150
 
< 0.1%
487
 
< 0.1%
1179
 
< 0.1%
Other values (8)108
 
< 0.1%
(Missing)463
 
0.1%
ValueCountFrequency (%)
0371665
89.9%
17217
 
1.7%
230589
 
7.4%
31040
 
0.3%
487
 
< 0.1%
ValueCountFrequency (%)
971550
0.4%
88251
 
0.1%
8721
 
< 0.1%
855
 
< 0.1%
72213
 
0.1%

JURIS_DESC
Categorical

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
N.Y. POLICE DEPT
372050 
N.Y. HOUSING POLICE
 
30660
N.Y. TRANSIT POLICE
 
7223
OTHER
 
1551
PORT AUTHORITY
 
1040
Other values (13)
 
888

Length

Max length28
Median length16
Mean length16.22790824
Min length5

Characters and Unicode

Total characters6708812
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowN.Y. POLICE DEPT
2nd rowN.Y. POLICE DEPT
3rd rowN.Y. POLICE DEPT
4th rowN.Y. POLICE DEPT
5th rowN.Y. HOUSING POLICE
ValueCountFrequency (%)
N.Y. POLICE DEPT372050
90.0%
N.Y. HOUSING POLICE30660
 
7.4%
N.Y. TRANSIT POLICE7223
 
1.7%
OTHER1551
 
0.4%
PORT AUTHORITY1040
 
0.3%
NYC PARKS251
 
0.1%
DEPT OF CORRECTIONS213
 
0.1%
HEALTH & HOSP CORP150
 
< 0.1%
TRI-BORO BRDG TUNNL87
 
< 0.1%
N.Y. STATE POLICE79
 
< 0.1%
Other values (8)108
 
< 0.1%
2021-03-06T16:39:41.944210image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
police410030
33.2%
n.y410027
33.2%
dept372268
30.1%
housing30660
 
2.5%
transit7223
 
0.6%
other1551
 
0.1%
port1040
 
0.1%
authority1040
 
0.1%
parks266
 
< 0.1%
nyc251
 
< 0.1%
Other values (31)1668
 
0.1%

Most occurring characters

ValueCountFrequency (%)
822612
12.3%
.820090
12.2%
E784400
11.7%
P783935
11.7%
I449359
6.7%
N448653
6.7%
O445534
6.6%
Y411365
6.1%
C410908
6.1%
L410285
6.1%
Other values (16)921671
13.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5065873
75.5%
Space Separator822612
 
12.3%
Other Punctuation820240
 
12.2%
Dash Punctuation87
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
E784400
15.5%
P783935
15.5%
I449359
8.9%
N448653
8.9%
O445534
8.8%
Y411365
8.1%
C410908
8.1%
L410285
8.1%
T392231
7.7%
D372385
7.4%
Other values (12)156818
 
3.1%
ValueCountFrequency (%)
.820090
> 99.9%
&150
 
< 0.1%
ValueCountFrequency (%)
822612
100.0%
ValueCountFrequency (%)
-87
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5065873
75.5%
Common1642939
 
24.5%

Most frequent character per script

ValueCountFrequency (%)
E784400
15.5%
P783935
15.5%
I449359
8.9%
N448653
8.9%
O445534
8.8%
Y411365
8.1%
C410908
8.1%
L410285
8.1%
T392231
7.7%
D372385
7.4%
Other values (12)156818
 
3.1%
ValueCountFrequency (%)
822612
50.1%
.820090
49.9%
&150
 
< 0.1%
-87
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII6708812
100.0%

Most frequent character per block

ValueCountFrequency (%)
822612
12.3%
.820090
12.2%
E784400
11.7%
P783935
11.7%
I449359
6.7%
N448653
6.7%
O445534
6.6%
Y411365
6.1%
C410908
6.1%
L410285
6.1%
Other values (16)921671
13.7%

KY_CD
Real number (ℝ≥0)

Distinct64
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean304.4962048
Minimum101
Maximum678
Zeros0
Zeros (%)0.0%
Memory size3.2 MiB
2021-03-06T16:39:42.270342image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum101
5-th percentile106
Q1117
median341
Q3351
95-th percentile578
Maximum678
Range577
Interquartile range (IQR)234

Descriptive statistics

Standard deviation159.5255018
Coefficient of variation (CV)0.5238998035
Kurtosis-0.8573247433
Mean304.4962048
Median Absolute Deviation (MAD)108
Skewness0.2539769354
Sum125882385
Variance25448.38573
MonotocityNot monotonic
2021-03-06T16:39:42.629341image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34182061
19.8%
57866736
16.1%
34443532
10.5%
35135711
8.6%
10935482
8.6%
10620554
 
5.0%
36115685
 
3.8%
10715468
 
3.7%
10513100
 
3.2%
12612332
 
3.0%
Other values (54)72751
17.6%
ValueCountFrequency (%)
101463
 
0.1%
1021
 
< 0.1%
10313
 
< 0.1%
1041426
 
0.3%
10513100
3.2%
ValueCountFrequency (%)
678438
 
0.1%
67711
 
< 0.1%
6761
 
< 0.1%
67547
 
< 0.1%
57866736
16.1%

LAW_CAT_CD
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
MISDEMEANOR
211170 
FELONY
134987 
VIOLATION
67255 

Length

Max length11
Median length11
Mean length9.042037967
Min length6

Characters and Unicode

Total characters3738087
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFELONY
2nd rowFELONY
3rd rowFELONY
4th rowFELONY
5th rowFELONY
ValueCountFrequency (%)
MISDEMEANOR211170
51.1%
FELONY134987
32.7%
VIOLATION67255
 
16.3%
2021-03-06T16:39:43.306621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-06T16:39:43.523042image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
misdemeanor211170
51.1%
felony134987
32.7%
violation67255
 
16.3%

Most occurring characters

ValueCountFrequency (%)
E557327
14.9%
O480667
12.9%
M422340
11.3%
N413412
11.1%
I345680
9.2%
A278425
7.4%
S211170
 
5.6%
D211170
 
5.6%
R211170
 
5.6%
L202242
 
5.4%
Other values (4)404484
10.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3738087
100.0%

Most frequent character per category

ValueCountFrequency (%)
E557327
14.9%
O480667
12.9%
M422340
11.3%
N413412
11.1%
I345680
9.2%
A278425
7.4%
S211170
 
5.6%
D211170
 
5.6%
R211170
 
5.6%
L202242
 
5.4%
Other values (4)404484
10.8%

Most occurring scripts

ValueCountFrequency (%)
Latin3738087
100.0%

Most frequent character per script

ValueCountFrequency (%)
E557327
14.9%
O480667
12.9%
M422340
11.3%
N413412
11.1%
I345680
9.2%
A278425
7.4%
S211170
 
5.6%
D211170
 
5.6%
R211170
 
5.6%
L202242
 
5.4%
Other values (4)404484
10.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII3738087
100.0%

Most frequent character per block

ValueCountFrequency (%)
E557327
14.9%
O480667
12.9%
M422340
11.3%
N413412
11.1%
I345680
9.2%
A278425
7.4%
S211170
 
5.6%
D211170
 
5.6%
R211170
 
5.6%
L202242
 
5.4%
Other values (4)404484
10.8%

LOC_OF_OCCUR_DESC
Categorical

MISSING

Distinct5
Distinct (%)< 0.1%
Missing66086
Missing (%)16.0%
Memory size3.2 MiB
INSIDE
216927 
FRONT OF
111673 
OPPOSITE OF
 
10027
REAR OF
 
8377
OUTSIDE
 
322

Length

Max length11
Median length6
Mean length6.812435579
Min length6

Characters and Unicode

Total characters2366136
Distinct characters13
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOUTSIDE
2nd rowINSIDE
3rd rowINSIDE
4th rowOUTSIDE
5th rowOUTSIDE
ValueCountFrequency (%)
INSIDE216927
52.5%
FRONT OF111673
27.0%
OPPOSITE OF10027
 
2.4%
REAR OF8377
 
2.0%
OUTSIDE322
 
0.1%
(Missing)66086
 
16.0%
2021-03-06T16:39:44.082677image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-06T16:39:44.278897image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
inside216927
45.4%
of130077
27.2%
front111673
23.4%
opposite10027
 
2.1%
rear8377
 
1.8%
outside322
 
0.1%

Most occurring characters

ValueCountFrequency (%)
I444203
18.8%
N328600
13.9%
O262126
11.1%
F241750
10.2%
E235653
10.0%
S227276
9.6%
D217249
9.2%
130077
 
5.5%
R128427
 
5.4%
T122022
 
5.2%
Other values (3)28753
 
1.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2236059
94.5%
Space Separator130077
 
5.5%

Most frequent character per category

ValueCountFrequency (%)
I444203
19.9%
N328600
14.7%
O262126
11.7%
F241750
10.8%
E235653
10.5%
S227276
10.2%
D217249
9.7%
R128427
 
5.7%
T122022
 
5.5%
P20054
 
0.9%
Other values (2)8699
 
0.4%
ValueCountFrequency (%)
130077
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2236059
94.5%
Common130077
 
5.5%

Most frequent character per script

ValueCountFrequency (%)
I444203
19.9%
N328600
14.7%
O262126
11.7%
F241750
10.8%
E235653
10.5%
S227276
10.2%
D217249
9.7%
R128427
 
5.7%
T122022
 
5.5%
P20054
 
0.9%
Other values (2)8699
 
0.4%
ValueCountFrequency (%)
130077
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2366136
100.0%

Most frequent character per block

ValueCountFrequency (%)
I444203
18.8%
N328600
13.9%
O262126
11.1%
F241750
10.2%
E235653
10.0%
S227276
9.6%
D217249
9.2%
130077
 
5.5%
R128427
 
5.4%
T122022
 
5.2%
Other values (3)28753
 
1.2%

OFNS_DESC
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct59
Distinct (%)< 0.1%
Missing6
Missing (%)< 0.1%
Memory size3.2 MiB
PETIT LARCENY
82061 
HARRASSMENT 2
66736 
CRIMINAL MISCHIEF & RELATED OF
47271 
ASSAULT 3 & RELATED OFFENSES
43532 
GRAND LARCENY
35482 
Other values (54)
138324 

Length

Max length36
Median length13
Mean length18.20467773
Min length4

Characters and Unicode

Total characters7525923
Distinct characters35
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowMURDER & NON-NEGL. MANSLAUGHTER
2nd rowMURDER & NON-NEGL. MANSLAUGHTER
3rd rowRAPE
4th rowMURDER & NON-NEGL. MANSLAUGHTER
5th rowMURDER & NON-NEGL. MANSLAUGHTER
ValueCountFrequency (%)
PETIT LARCENY82061
19.8%
HARRASSMENT 266736
16.1%
CRIMINAL MISCHIEF & RELATED OF47271
11.4%
ASSAULT 3 & RELATED OFFENSES43532
10.5%
GRAND LARCENY35482
8.6%
FELONY ASSAULT20554
 
5.0%
OFF. AGNST PUB ORD SENSBLTY &15685
 
3.8%
BURGLARY15468
 
3.7%
ROBBERY13100
 
3.2%
MISCELLANEOUS PENAL LAW12765
 
3.1%
Other values (49)60752
14.7%
2021-03-06T16:39:45.091837image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
larceny126745
 
10.5%
109723
 
9.1%
related91742
 
7.6%
petit82224
 
6.8%
harrassment66736
 
5.5%
266736
 
5.5%
assault64086
 
5.3%
of59233
 
4.9%
offenses51788
 
4.3%
criminal49410
 
4.1%
Other values (104)440054
36.4%

Most occurring characters

ValueCountFrequency (%)
795071
 
10.6%
E788876
 
10.5%
A721832
 
9.6%
R588569
 
7.8%
S556691
 
7.4%
N470878
 
6.3%
L470049
 
6.2%
T463775
 
6.2%
I363745
 
4.8%
F286247
 
3.8%
Other values (25)2020190
26.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter6490518
86.2%
Space Separator795071
 
10.6%
Other Punctuation126136
 
1.7%
Decimal Number110272
 
1.5%
Dash Punctuation3661
 
< 0.1%
Open Punctuation265
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
E788876
12.2%
A721832
11.1%
R588569
 
9.1%
S556691
 
8.6%
N470878
 
7.3%
L470049
 
7.2%
T463775
 
7.1%
I363745
 
5.6%
F286247
 
4.4%
C274897
 
4.2%
Other values (15)1504959
23.2%
ValueCountFrequency (%)
&109723
87.0%
.16152
 
12.8%
'229
 
0.2%
/18
 
< 0.1%
,14
 
< 0.1%
ValueCountFrequency (%)
266736
60.5%
343536
39.5%
ValueCountFrequency (%)
795071
100.0%
ValueCountFrequency (%)
-3661
100.0%
ValueCountFrequency (%)
(265
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6490518
86.2%
Common1035405
 
13.8%

Most frequent character per script

ValueCountFrequency (%)
E788876
12.2%
A721832
11.1%
R588569
 
9.1%
S556691
 
8.6%
N470878
 
7.3%
L470049
 
7.2%
T463775
 
7.1%
I363745
 
5.6%
F286247
 
4.4%
C274897
 
4.2%
Other values (15)1504959
23.2%
ValueCountFrequency (%)
795071
76.8%
&109723
 
10.6%
266736
 
6.4%
343536
 
4.2%
.16152
 
1.6%
-3661
 
0.4%
(265
 
< 0.1%
'229
 
< 0.1%
/18
 
< 0.1%
,14
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII7525923
100.0%

Most frequent character per block

ValueCountFrequency (%)
795071
 
10.6%
E788876
 
10.5%
A721832
 
9.6%
R588569
 
7.8%
S556691
 
7.4%
N470878
 
6.3%
L470049
 
6.2%
T463775
 
6.2%
I363745
 
4.8%
F286247
 
3.8%
Other values (25)2020190
26.8%

PARKS_NM
Categorical

HIGH CARDINALITY
MISSING

Distinct508
Distinct (%)19.0%
Missing410736
Missing (%)99.4%
Memory size3.2 MiB
WASHINGTON SQUARE PARK
 
200
CENTRAL PARK
 
184
FLUSHING MEADOWS CORONA PARK
 
100
CONEY ISLAND BEACH & BOARDWALK
 
70
UNION SQUARE PARK
 
66
Other values (503)
2056 

Length

Max length59
Median length17
Mean length18.51606876
Min length7

Characters and Unicode

Total characters49549
Distinct characters45
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique246 ?
Unique (%)9.2%

Sample

1st rowMARCUS GARVEY PARK
2nd rowBROOKVILLE PARK
3rd rowWAYANDA PARK
4th rowASPHALT GREEN
5th rowWASHINGTON SQUARE PARK
ValueCountFrequency (%)
WASHINGTON SQUARE PARK200
 
< 0.1%
CENTRAL PARK184
 
< 0.1%
FLUSHING MEADOWS CORONA PARK100
 
< 0.1%
CONEY ISLAND BEACH & BOARDWALK70
 
< 0.1%
UNION SQUARE PARK66
 
< 0.1%
PROSPECT PARK64
 
< 0.1%
RIVERSIDE PARK64
 
< 0.1%
HUDSON RIVER PARK51
 
< 0.1%
MARCUS GARVEY PARK45
 
< 0.1%
BRYANT PARK40
 
< 0.1%
Other values (498)1792
 
0.4%
(Missing)410736
99.4%
2021-03-06T16:39:45.960790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
park2045
26.7%
square357
 
4.7%
playground317
 
4.1%
washington208
 
2.7%
central185
 
2.4%
beach126
 
1.6%
corona104
 
1.4%
boardwalk104
 
1.4%
flushing101
 
1.3%
meadows100
 
1.3%
Other values (676)4024
52.5%

Most occurring characters

ValueCountFrequency (%)
A5782
 
11.7%
R5475
 
11.0%
4995
 
10.1%
E3207
 
6.5%
N3057
 
6.2%
P2804
 
5.7%
O2664
 
5.4%
K2544
 
5.1%
S2259
 
4.6%
L1896
 
3.8%
Other values (35)14866
30.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter44059
88.9%
Space Separator4995
 
10.1%
Other Punctuation399
 
0.8%
Decimal Number35
 
0.1%
Open Punctuation21
 
< 0.1%
Close Punctuation21
 
< 0.1%
Dash Punctuation19
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
A5782
13.1%
R5475
12.4%
E3207
 
7.3%
N3057
 
6.9%
P2804
 
6.4%
O2664
 
6.0%
K2544
 
5.8%
S2259
 
5.1%
L1896
 
4.3%
I1697
 
3.9%
Other values (16)12674
28.8%
ValueCountFrequency (%)
48
22.9%
17
20.0%
56
17.1%
24
11.4%
73
 
8.6%
92
 
5.7%
82
 
5.7%
62
 
5.7%
31
 
2.9%
ValueCountFrequency (%)
.202
50.6%
'81
20.3%
&76
 
19.0%
/31
 
7.8%
"8
 
2.0%
,1
 
0.3%
ValueCountFrequency (%)
4995
100.0%
ValueCountFrequency (%)
-19
100.0%
ValueCountFrequency (%)
(21
100.0%
ValueCountFrequency (%)
)21
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin44059
88.9%
Common5490
 
11.1%

Most frequent character per script

ValueCountFrequency (%)
A5782
13.1%
R5475
12.4%
E3207
 
7.3%
N3057
 
6.9%
P2804
 
6.4%
O2664
 
6.0%
K2544
 
5.8%
S2259
 
5.1%
L1896
 
4.3%
I1697
 
3.9%
Other values (16)12674
28.8%
ValueCountFrequency (%)
4995
91.0%
.202
 
3.7%
'81
 
1.5%
&76
 
1.4%
/31
 
0.6%
(21
 
0.4%
)21
 
0.4%
-19
 
0.3%
48
 
0.1%
"8
 
0.1%
Other values (9)28
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII49549
100.0%

Most frequent character per block

ValueCountFrequency (%)
A5782
 
11.7%
R5475
 
11.0%
4995
 
10.1%
E3207
 
6.5%
N3057
 
6.2%
P2804
 
5.7%
O2664
 
5.4%
K2544
 
5.1%
S2259
 
4.6%
L1896
 
3.8%
Other values (35)14866
30.0%

PATROL_BORO
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing463
Missing (%)0.1%
Memory size3.2 MiB
PATROL BORO BRONX
90442 
PATROL BORO BKLYN NORTH
60579 
PATROL BORO BKLYN SOUTH
58634 
PATROL BORO MAN NORTH
51366 
PATROL BORO QUEENS NORTH
46213 
Other values (3)
105715 

Length

Max length25
Median length23
Mean length21.51259841
Min length17

Characters and Unicode

Total characters8883606
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPATROL BORO BRONX
2nd rowPATROL BORO BRONX
3rd rowPATROL BORO BRONX
4th rowPATROL BORO BRONX
5th rowPATROL BORO QUEENS SOUTH
ValueCountFrequency (%)
PATROL BORO BRONX90442
21.9%
PATROL BORO BKLYN NORTH60579
14.7%
PATROL BORO BKLYN SOUTH58634
14.2%
PATROL BORO MAN NORTH51366
12.4%
PATROL BORO QUEENS NORTH46213
11.2%
PATROL BORO MAN SOUTH45916
11.1%
PATROL BORO QUEENS SOUTH42816
10.4%
PATROL BORO STATEN ISLAND16983
 
4.1%
(Missing)463
 
0.1%
2021-03-06T16:39:46.675606image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-06T16:39:46.977697image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
patrol412949
26.4%
boro412949
26.4%
north158158
 
10.1%
south147366
 
9.4%
bklyn119213
 
7.6%
man97282
 
6.2%
bronx90442
 
5.8%
queens89029
 
5.7%
island16983
 
1.1%
staten16983
 
1.1%

Most occurring characters

ValueCountFrequency (%)
O1634813
18.4%
1148405
12.9%
R1074498
12.1%
T752439
8.5%
B622604
 
7.0%
N588090
 
6.6%
L549145
 
6.2%
A544197
 
6.1%
P412949
 
4.6%
H305524
 
3.4%
Other values (10)1250942
14.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter7735201
87.1%
Space Separator1148405
 
12.9%

Most frequent character per category

ValueCountFrequency (%)
O1634813
21.1%
R1074498
13.9%
T752439
9.7%
B622604
 
8.0%
N588090
 
7.6%
L549145
 
7.1%
A544197
 
7.0%
P412949
 
5.3%
H305524
 
3.9%
S270361
 
3.5%
Other values (9)980581
12.7%
ValueCountFrequency (%)
1148405
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7735201
87.1%
Common1148405
 
12.9%

Most frequent character per script

ValueCountFrequency (%)
O1634813
21.1%
R1074498
13.9%
T752439
9.7%
B622604
 
8.0%
N588090
 
7.6%
L549145
 
7.1%
A544197
 
7.0%
P412949
 
5.3%
H305524
 
3.9%
S270361
 
3.5%
Other values (9)980581
12.7%
ValueCountFrequency (%)
1148405
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII8883606
100.0%

Most frequent character per block

ValueCountFrequency (%)
O1634813
18.4%
1148405
12.9%
R1074498
12.1%
T752439
8.5%
B622604
 
7.0%
N588090
 
6.6%
L549145
 
6.2%
A544197
 
6.1%
P412949
 
4.6%
H305524
 
3.4%
Other values (10)1250942
14.1%

PD_CD
Real number (ℝ≥0)

Distinct345
Distinct (%)0.1%
Missing463
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean390.6094578
Minimum100
Maximum922
Zeros0
Zeros (%)0.0%
Memory size3.2 MiB
2021-03-06T16:39:47.532213image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile101
Q1254
median343
Q3637
95-th percentile748
Maximum922
Range822
Interquartile range (IQR)383

Descriptive statistics

Standard deviation210.3504185
Coefficient of variation (CV)0.5385184979
Kurtosis-0.6739598702
Mean390.6094578
Median Absolute Deviation (MAD)135
Skewness0.4442739057
Sum161301785
Variance44247.29856
MonotocityNot monotonic
2021-03-06T16:39:47.868226image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
63847900
 
11.6%
10133373
 
8.1%
33328168
 
6.8%
63718836
 
4.6%
10915861
 
3.8%
63914444
 
3.5%
25414002
 
3.4%
32112842
 
3.1%
25912555
 
3.0%
3529983
 
2.4%
Other values (335)204985
49.6%
ValueCountFrequency (%)
1008
 
< 0.1%
10133373
8.1%
10213
 
< 0.1%
10336
 
< 0.1%
10418
 
< 0.1%
ValueCountFrequency (%)
922244
 
0.1%
91816
 
< 0.1%
9166056
1.5%
90732
 
< 0.1%
9052349
 
0.6%

PD_DESC
Categorical

HIGH CARDINALITY

Distinct336
Distinct (%)0.1%
Missing463
Missing (%)0.1%
Memory size3.2 MiB
HARASSMENT,SUBD 3,4,5
47900 
ASSAULT 3
33373 
LARCENY,PETIT FROM STORE-SHOPL
28168 
HARASSMENT,SUBD 1,CIVILIAN
 
18836
ASSAULT 2,1,UNCLASSIFIED
 
15861
Other values (331)
268811 

Length

Max length71
Median length26
Mean length26.59849521
Min length6

Characters and Unicode

Total characters10983822
Distinct characters40
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique27 ?
Unique (%)< 0.1%

Sample

1st rowRAPE 1
2nd rowWEAPONS POSSESSION 3
3rd rowHARASSMENT,SUBD 3,4,5
4th rowARSON 1
5th rowFORGERY,ETC.-MISD.
ValueCountFrequency (%)
HARASSMENT,SUBD 3,4,547900
 
11.6%
ASSAULT 333373
 
8.1%
LARCENY,PETIT FROM STORE-SHOPL28168
 
6.8%
HARASSMENT,SUBD 1,CIVILIAN18836
 
4.6%
ASSAULT 2,1,UNCLASSIFIED15861
 
3.8%
AGGRAVATED HARASSMENT 214444
 
3.5%
MISCHIEF, CRIMINAL 4, OF MOTOR14002
 
3.4%
LARCENY,PETIT FROM AUTO12842
 
3.1%
CRIMINAL MISCHIEF,UNCLASSIFIED 412555
 
3.0%
LARCENY,PETIT FROM BUILDING,UNATTENDED, PACKAGE THEFT INSIDE9983
 
2.4%
Other values (326)204985
49.6%
2021-03-06T16:39:48.673562image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
from87018
 
7.1%
larceny,petit81807
 
6.7%
harassment,subd66736
 
5.5%
of52734
 
4.3%
criminal51321
 
4.2%
assault50818
 
4.2%
3,4,547900
 
3.9%
344699
 
3.7%
larceny,grand44063
 
3.6%
store-shopl30919
 
2.5%
Other values (476)665877
54.4%

Most occurring characters

ValueCountFrequency (%)
E888683
 
8.1%
A880169
 
8.0%
830564
 
7.6%
S746812
 
6.8%
I710698
 
6.5%
N688398
 
6.3%
R688291
 
6.3%
T630642
 
5.7%
,550538
 
5.0%
C541448
 
4.9%
Other values (30)3827579
34.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter9171067
83.5%
Space Separator830564
 
7.6%
Other Punctuation580346
 
5.3%
Decimal Number337236
 
3.1%
Dash Punctuation56782
 
0.5%
Open Punctuation4015
 
< 0.1%
Close Punctuation3812
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
E888683
 
9.7%
A880169
 
9.6%
S746812
 
8.1%
I710698
 
7.7%
N688398
 
7.5%
R688291
 
7.5%
T630642
 
6.9%
C541448
 
5.9%
L508484
 
5.5%
O454811
 
5.0%
Other values (16)2432631
26.5%
ValueCountFrequency (%)
396291
28.6%
485730
25.4%
153953
16.0%
251774
15.4%
549380
14.6%
7108
 
< 0.1%
ValueCountFrequency (%)
,550538
94.9%
/14794
 
2.5%
&9128
 
1.6%
.5886
 
1.0%
ValueCountFrequency (%)
830564
100.0%
ValueCountFrequency (%)
-56782
100.0%
ValueCountFrequency (%)
(4015
100.0%
ValueCountFrequency (%)
)3812
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9171067
83.5%
Common1812755
 
16.5%

Most frequent character per script

ValueCountFrequency (%)
E888683
 
9.7%
A880169
 
9.6%
S746812
 
8.1%
I710698
 
7.7%
N688398
 
7.5%
R688291
 
7.5%
T630642
 
6.9%
C541448
 
5.9%
L508484
 
5.5%
O454811
 
5.0%
Other values (16)2432631
26.5%
ValueCountFrequency (%)
830564
45.8%
,550538
30.4%
396291
 
5.3%
485730
 
4.7%
-56782
 
3.1%
153953
 
3.0%
251774
 
2.9%
549380
 
2.7%
/14794
 
0.8%
&9128
 
0.5%
Other values (4)13821
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII10983822
100.0%

Most frequent character per block

ValueCountFrequency (%)
E888683
 
8.1%
A880169
 
8.0%
830564
 
7.6%
S746812
 
6.8%
I710698
 
6.5%
N688398
 
6.3%
R688291
 
6.3%
T630642
 
5.7%
,550538
 
5.0%
C541448
 
4.9%
Other values (30)3827579
34.8%

PREM_TYP_DESC
Categorical

HIGH CARDINALITY

Distinct74
Distinct (%)< 0.1%
Missing1172
Missing (%)0.3%
Memory size3.2 MiB
STREET
123143 
RESIDENCE - APT. HOUSE
99028 
RESIDENCE-HOUSE
44419 
RESIDENCE - PUBLIC HOUSING
30613 
CHAIN STORE
14703 
Other values (69)
100334 

Length

Max length28
Median length15
Mean length14.54387978
Min length3

Characters and Unicode

Total characters5995569
Distinct characters32
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSTREET
2nd rowSTREET
3rd rowRESIDENCE - APT. HOUSE
4th rowOTHER
5th rowSTREET
ValueCountFrequency (%)
STREET123143
29.8%
RESIDENCE - APT. HOUSE99028
24.0%
RESIDENCE-HOUSE44419
 
10.7%
RESIDENCE - PUBLIC HOUSING30613
 
7.4%
CHAIN STORE14703
 
3.6%
OTHER9283
 
2.2%
COMMERCIAL BUILDING8933
 
2.2%
DRUG STORE8594
 
2.1%
GROCERY/BODEGA7194
 
1.7%
TRANSIT - NYC SUBWAY7141
 
1.7%
Other values (64)59189
14.3%
2021-03-06T16:39:49.470352image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
137878
15.3%
residence129641
14.4%
street123143
13.6%
house99058
11.0%
apt99028
11.0%
residence-house44419
 
4.9%
public34962
 
3.9%
store33611
 
3.7%
housing30613
 
3.4%
chain14703
 
1.6%
Other values (92)155139
17.2%

Most occurring characters

ValueCountFrequency (%)
E1050619
17.5%
S559334
 
9.3%
489955
 
8.2%
T475201
 
7.9%
R440756
 
7.4%
I334220
 
5.6%
N291622
 
4.9%
O290828
 
4.9%
C276778
 
4.6%
U262111
 
4.4%
Other values (22)1524145
25.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5179898
86.4%
Space Separator489955
 
8.2%
Dash Punctuation181201
 
3.0%
Other Punctuation132395
 
2.2%
Open Punctuation6060
 
0.1%
Close Punctuation6060
 
0.1%

Most frequent character per category

ValueCountFrequency (%)
E1050619
20.3%
S559334
10.8%
T475201
9.2%
R440756
8.5%
I334220
 
6.5%
N291622
 
5.6%
O290828
 
5.6%
C276778
 
5.3%
U262111
 
5.1%
D227371
 
4.4%
Other values (15)971058
18.7%
ValueCountFrequency (%)
.99833
75.4%
/31466
 
23.8%
&1096
 
0.8%
ValueCountFrequency (%)
489955
100.0%
ValueCountFrequency (%)
-181201
100.0%
ValueCountFrequency (%)
(6060
100.0%
ValueCountFrequency (%)
)6060
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin5179898
86.4%
Common815671
 
13.6%

Most frequent character per script

ValueCountFrequency (%)
E1050619
20.3%
S559334
10.8%
T475201
9.2%
R440756
8.5%
I334220
 
6.5%
N291622
 
5.6%
O290828
 
5.6%
C276778
 
5.3%
U262111
 
5.1%
D227371
 
4.4%
Other values (15)971058
18.7%
ValueCountFrequency (%)
489955
60.1%
-181201
 
22.2%
.99833
 
12.2%
/31466
 
3.9%
(6060
 
0.7%
)6060
 
0.7%
&1096
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII5995569
100.0%

Most frequent character per block

ValueCountFrequency (%)
E1050619
17.5%
S559334
 
9.3%
489955
 
8.2%
T475201
 
7.9%
R440756
 
7.4%
I334220
 
5.6%
N291622
 
4.9%
O290828
 
4.9%
C276778
 
4.6%
U262111
 
4.4%
Other values (22)1524145
25.4%

RPT_DT
Categorical

HIGH CARDINALITY

Distinct366
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
06/02/2020
 
1502
01/15/2020
 
1500
01/14/2020
 
1476
03/11/2020
 
1441
10/21/2020
 
1429
Other values (361)
406064 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters4134120
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row12/23/2020
2nd row12/21/2020
3rd row11/23/2020
4th row11/22/2020
5th row11/21/2020
ValueCountFrequency (%)
06/02/20201502
 
0.4%
01/15/20201500
 
0.4%
01/14/20201476
 
0.4%
03/11/20201441
 
0.3%
10/21/20201429
 
0.3%
02/05/20201420
 
0.3%
09/08/20201417
 
0.3%
02/03/20201416
 
0.3%
01/02/20201415
 
0.3%
02/18/20201412
 
0.3%
Other values (356)398984
96.5%
2021-03-06T16:39:50.293427image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
06/02/20201502
 
0.4%
01/15/20201500
 
0.4%
01/14/20201476
 
0.4%
03/11/20201441
 
0.3%
10/21/20201429
 
0.3%
02/05/20201420
 
0.3%
09/08/20201417
 
0.3%
02/03/20201416
 
0.3%
01/02/20201415
 
0.3%
02/18/20201412
 
0.3%
Other values (356)398984
96.5%

Most occurring characters

ValueCountFrequency (%)
01334142
32.3%
21072786
25.9%
/826824
20.0%
1366157
 
8.9%
393678
 
2.3%
878591
 
1.9%
776493
 
1.9%
976006
 
1.8%
572256
 
1.7%
672121
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3307296
80.0%
Other Punctuation826824
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
01334142
40.3%
21072786
32.4%
1366157
 
11.1%
393678
 
2.8%
878591
 
2.4%
776493
 
2.3%
976006
 
2.3%
572256
 
2.2%
672121
 
2.2%
465066
 
2.0%
ValueCountFrequency (%)
/826824
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4134120
100.0%

Most frequent character per script

ValueCountFrequency (%)
01334142
32.3%
21072786
25.9%
/826824
20.0%
1366157
 
8.9%
393678
 
2.3%
878591
 
1.9%
776493
 
1.9%
976006
 
1.8%
572256
 
1.7%
672121
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII4134120
100.0%

Most frequent character per block

ValueCountFrequency (%)
01334142
32.3%
21072786
25.9%
/826824
20.0%
1366157
 
8.9%
393678
 
2.3%
878591
 
1.9%
776493
 
1.9%
976006
 
1.8%
572256
 
1.7%
672121
 
1.7%

STATION_NAME
Categorical

HIGH CARDINALITY
MISSING

Distinct362
Distinct (%)5.0%
Missing406179
Missing (%)98.3%
Memory size3.2 MiB
125 STREET
 
250
34 ST.-PENN STATION
 
148
14 STREET
 
136
42 ST.-TIMES SQUARE
 
133
59 ST.-COLUMBUS CIRCLE
 
126
Other values (357)
6440 

Length

Max length30
Median length14
Mean length15.60320752
Min length6

Characters and Unicode

Total characters112858
Distinct characters42
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12 ?
Unique (%)0.2%

Sample

1st row34 ST.-PENN STATION
2nd rowWINTHROP STREET
3rd rowPACIFIC STREET
4th rowMAIN ST.-FLUSHING
5th rowGRAND AVE.-NEWTON
ValueCountFrequency (%)
125 STREET250
 
0.1%
34 ST.-PENN STATION148
 
< 0.1%
14 STREET136
 
< 0.1%
42 ST.-TIMES SQUARE133
 
< 0.1%
59 ST.-COLUMBUS CIRCLE126
 
< 0.1%
42 ST.-PORT AUTHORITY BUS TERM117
 
< 0.1%
161 ST.-YANKEE STADIUM97
 
< 0.1%
34 ST.-HERALD SQ.95
 
< 0.1%
UTICA AVE.-CROWN HEIGHTS95
 
< 0.1%
W. 4 STREET88
 
< 0.1%
Other values (352)5948
 
1.4%
(Missing)406179
98.3%
2021-03-06T16:39:51.185434image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
street2581
 
14.9%
avenue1335
 
7.7%
42395
 
2.3%
34359
 
2.1%
125250
 
1.4%
square216
 
1.2%
east193
 
1.1%
59192
 
1.1%
road179
 
1.0%
st.-grand176
 
1.0%
Other values (433)11426
66.0%

Most occurring characters

ValueCountFrequency (%)
E14002
 
12.4%
T11195
 
9.9%
10069
 
8.9%
S8183
 
7.3%
R7768
 
6.9%
A7598
 
6.7%
N5838
 
5.2%
O4144
 
3.7%
U3734
 
3.3%
L3179
 
2.8%
Other values (32)37148
32.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter89056
78.9%
Space Separator10069
 
8.9%
Decimal Number8626
 
7.6%
Other Punctuation2704
 
2.4%
Dash Punctuation2403
 
2.1%

Most frequent character per category

ValueCountFrequency (%)
E14002
15.7%
T11195
12.6%
S8183
 
9.2%
R7768
 
8.7%
A7598
 
8.5%
N5838
 
6.6%
O4144
 
4.7%
U3734
 
4.2%
L3179
 
3.6%
I2812
 
3.2%
Other values (16)20603
23.1%
ValueCountFrequency (%)
11908
22.1%
41436
16.6%
21054
12.2%
5879
10.2%
3842
9.8%
6592
 
6.9%
7573
 
6.6%
9534
 
6.2%
8435
 
5.0%
0373
 
4.3%
ValueCountFrequency (%)
.2399
88.7%
/240
 
8.9%
"56
 
2.1%
'9
 
0.3%
ValueCountFrequency (%)
10069
100.0%
ValueCountFrequency (%)
-2403
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin89056
78.9%
Common23802
 
21.1%

Most frequent character per script

ValueCountFrequency (%)
E14002
15.7%
T11195
12.6%
S8183
 
9.2%
R7768
 
8.7%
A7598
 
8.5%
N5838
 
6.6%
O4144
 
4.7%
U3734
 
4.2%
L3179
 
3.6%
I2812
 
3.2%
Other values (16)20603
23.1%
ValueCountFrequency (%)
10069
42.3%
-2403
 
10.1%
.2399
 
10.1%
11908
 
8.0%
41436
 
6.0%
21054
 
4.4%
5879
 
3.7%
3842
 
3.5%
6592
 
2.5%
7573
 
2.4%
Other values (6)1647
 
6.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII112858
100.0%

Most frequent character per block

ValueCountFrequency (%)
E14002
 
12.4%
T11195
 
9.9%
10069
 
8.9%
S8183
 
7.3%
R7768
 
6.9%
A7598
 
6.7%
N5838
 
5.2%
O4144
 
3.7%
U3734
 
3.3%
L3179
 
2.8%
Other values (32)37148
32.9%

SUSP_AGE_GROUP
Categorical

MISSING

Distinct17
Distinct (%)< 0.1%
Missing94862
Missing (%)22.9%
Memory size3.2 MiB
UNKNOWN
144594 
25-44
100390 
45-64
33930 
18-24
29666 
<18
 
6632
Other values (12)
 
3338

Length

Max length7
Median length5
Mean length5.845289593
Min length3

Characters and Unicode

Total characters1862017
Distinct characters17
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)< 0.1%

Sample

1st rowUNKNOWN
2nd row25-44
3rd row25-44
4th row<18
5th row25-44
ValueCountFrequency (%)
UNKNOWN144594
35.0%
25-44100390
24.3%
45-6433930
 
8.2%
18-2429666
 
7.2%
<186632
 
1.6%
65+3317
 
0.8%
202010
 
< 0.1%
20192
 
< 0.1%
19251
 
< 0.1%
-9771
 
< 0.1%
Other values (7)7
 
< 0.1%
(Missing)94862
22.9%
2021-03-06T16:39:51.994308image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
unknown144594
45.4%
25-44100390
31.5%
45-6433930
 
10.7%
18-2429666
 
9.3%
186632
 
2.1%
653317
 
1.0%
202010
 
< 0.1%
20192
 
< 0.1%
9421
 
< 0.1%
10201
 
< 0.1%
Other values (7)7
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N433782
23.3%
4298307
16.0%
-163993
 
8.8%
U144594
 
7.8%
K144594
 
7.8%
O144594
 
7.8%
W144594
 
7.8%
5137639
 
7.4%
2130084
 
7.0%
637249
 
2.0%
Other values (7)82587
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1012158
54.4%
Decimal Number675917
36.3%
Dash Punctuation163993
 
8.8%
Math Symbol9949
 
0.5%

Most frequent character per category

ValueCountFrequency (%)
4298307
44.1%
5137639
20.4%
2130084
19.2%
637249
 
5.5%
136304
 
5.4%
836299
 
5.4%
024
 
< 0.1%
98
 
< 0.1%
73
 
< 0.1%
ValueCountFrequency (%)
N433782
42.9%
U144594
 
14.3%
K144594
 
14.3%
O144594
 
14.3%
W144594
 
14.3%
ValueCountFrequency (%)
<6632
66.7%
+3317
33.3%
ValueCountFrequency (%)
-163993
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin1012158
54.4%
Common849859
45.6%

Most frequent character per script

ValueCountFrequency (%)
4298307
35.1%
-163993
19.3%
5137639
16.2%
2130084
15.3%
637249
 
4.4%
136304
 
4.3%
836299
 
4.3%
<6632
 
0.8%
+3317
 
0.4%
024
 
< 0.1%
Other values (2)11
 
< 0.1%
ValueCountFrequency (%)
N433782
42.9%
U144594
 
14.3%
K144594
 
14.3%
O144594
 
14.3%
W144594
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1862017
100.0%

Most frequent character per block

ValueCountFrequency (%)
N433782
23.3%
4298307
16.0%
-163993
 
8.8%
U144594
 
7.8%
K144594
 
7.8%
O144594
 
7.8%
W144594
 
7.8%
5137639
 
7.4%
2130084
 
7.0%
637249
 
2.0%
Other values (7)82587
 
4.4%

SUSP_RACE
Categorical

MISSING

Distinct7
Distinct (%)< 0.1%
Missing94862
Missing (%)22.9%
Memory size3.2 MiB
BLACK
115253 
UNKNOWN
96066 
WHITE HISPANIC
50625 
WHITE
29164 
BLACK HISPANIC
15961 
Other values (2)
 
11481

Length

Max length30
Median length7
Mean length8.180847591
Min length5

Characters and Unicode

Total characters2606009
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUNKNOWN
2nd rowBLACK
3rd rowBLACK
4th rowWHITE HISPANIC
5th rowWHITE
ValueCountFrequency (%)
BLACK115253
27.9%
UNKNOWN96066
23.2%
WHITE HISPANIC50625
12.2%
WHITE29164
 
7.1%
BLACK HISPANIC15961
 
3.9%
ASIAN / PACIFIC ISLANDER10862
 
2.6%
AMERICAN INDIAN/ALASKAN NATIVE619
 
0.1%
(Missing)94862
22.9%
2021-03-06T16:39:52.788240image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-06T16:39:52.996921image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
black131214
31.3%
unknown96066
22.9%
white79789
19.0%
hispanic66586
15.9%
pacific10862
 
2.6%
islander10862
 
2.6%
asian10862
 
2.6%
10862
 
2.6%
american619
 
0.1%
native619
 
0.1%

Most occurring characters

ValueCountFrequency (%)
N379603
14.6%
I258885
9.9%
A245581
9.4%
K227899
 
8.7%
C220143
 
8.4%
W175855
 
6.7%
H146375
 
5.6%
L142695
 
5.5%
B131214
 
5.0%
100410
 
3.9%
Other values (12)577349
22.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2494118
95.7%
Space Separator100410
 
3.9%
Other Punctuation11481
 
0.4%

Most frequent character per category

ValueCountFrequency (%)
N379603
15.2%
I258885
10.4%
A245581
9.8%
K227899
9.1%
C220143
8.8%
W175855
 
7.1%
H146375
 
5.9%
L142695
 
5.7%
B131214
 
5.3%
U96066
 
3.9%
Other values (10)469802
18.8%
ValueCountFrequency (%)
100410
100.0%
ValueCountFrequency (%)
/11481
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2494118
95.7%
Common111891
 
4.3%

Most frequent character per script

ValueCountFrequency (%)
N379603
15.2%
I258885
10.4%
A245581
9.8%
K227899
9.1%
C220143
8.8%
W175855
 
7.1%
H146375
 
5.9%
L142695
 
5.7%
B131214
 
5.3%
U96066
 
3.9%
Other values (10)469802
18.8%
ValueCountFrequency (%)
100410
89.7%
/11481
 
10.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2606009
100.0%

Most frequent character per block

ValueCountFrequency (%)
N379603
14.6%
I258885
9.9%
A245581
9.4%
K227899
 
8.7%
C220143
 
8.4%
W175855
 
6.7%
H146375
 
5.6%
L142695
 
5.5%
B131214
 
5.0%
100410
 
3.9%
Other values (12)577349
22.2%

SUSP_SEX
Categorical

MISSING

Distinct3
Distinct (%)< 0.1%
Missing94862
Missing (%)22.9%
Memory size3.2 MiB
M
186273 
U
82849 
F
49428 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters318550
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowU
2nd rowM
3rd rowM
4th rowM
5th rowM
ValueCountFrequency (%)
M186273
45.1%
U82849
20.0%
F49428
 
12.0%
(Missing)94862
22.9%
2021-03-06T16:39:53.838009image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-06T16:39:54.037861image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
m186273
58.5%
u82849
26.0%
f49428
 
15.5%

Most occurring characters

ValueCountFrequency (%)
M186273
58.5%
U82849
26.0%
F49428
 
15.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter318550
100.0%

Most frequent character per category

ValueCountFrequency (%)
M186273
58.5%
U82849
26.0%
F49428
 
15.5%

Most occurring scripts

ValueCountFrequency (%)
Latin318550
100.0%

Most frequent character per script

ValueCountFrequency (%)
M186273
58.5%
U82849
26.0%
F49428
 
15.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII318550
100.0%

Most frequent character per block

ValueCountFrequency (%)
M186273
58.5%
U82849
26.0%
F49428
 
15.5%

TRANSIT_DISTRICT
Real number (ℝ≥0)

MISSING

Distinct12
Distinct (%)0.2%
Missing406179
Missing (%)98.3%
Infinite0
Infinite (%)0.0%
Mean13.85275819
Minimum1
Maximum34
Zeros0
Zeros (%)0.0%
Memory size3.2 MiB
2021-03-06T16:39:54.264804image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median11
Q330
95-th percentile33
Maximum34
Range33
Interquartile range (IQR)28

Descriptive statistics

Standard deviation12.57269892
Coefficient of variation (CV)0.9075953503
Kurtosis-1.394793232
Mean13.85275819
Median Absolute Deviation (MAD)9
Skewness0.5144579587
Sum100197
Variance158.0727582
MonotocityNot monotonic
2021-03-06T16:39:54.506905image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
21023
 
0.2%
4881
 
0.2%
1830
 
0.2%
20653
 
0.2%
3623
 
0.2%
33597
 
0.1%
11593
 
0.1%
32583
 
0.1%
12562
 
0.1%
30462
 
0.1%
Other values (2)426
 
0.1%
(Missing)406179
98.3%
ValueCountFrequency (%)
1830
0.2%
21023
0.2%
3623
0.2%
4881
0.2%
11593
0.1%
ValueCountFrequency (%)
34326
0.1%
33597
0.1%
32583
0.1%
30462
0.1%
23100
 
< 0.1%

VIC_AGE_GROUP
Categorical

Distinct26
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size3.2 MiB
25-44
159768 
UNKNOWN
100179 
45-64
84566 
18-24
37700 
65+
18941 
Other values (21)
 
12257

Length

Max length7
Median length5
Mean length5.333735677
Min length2

Characters and Unicode

Total characters2205025
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19 ?
Unique (%)< 0.1%

Sample

1st row18-24
2nd row25-44
3rd row25-44
4th row25-44
5th row18-24
ValueCountFrequency (%)
25-44159768
38.6%
UNKNOWN100179
24.2%
45-6484566
20.5%
18-2437700
 
9.1%
65+18941
 
4.6%
<1812236
 
3.0%
-9482
 
< 0.1%
-9631
 
< 0.1%
-9581
 
< 0.1%
9501
 
< 0.1%
Other values (16)16
 
< 0.1%
2021-03-06T16:39:55.309029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
25-44159768
38.6%
unknown100179
24.2%
45-6484566
20.5%
18-2437700
 
9.1%
6518941
 
4.6%
1812236
 
3.0%
9382
 
< 0.1%
9482
 
< 0.1%
9681
 
< 0.1%
9731
 
< 0.1%
Other values (15)15
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
4526374
23.9%
N300537
13.6%
-282048
12.8%
5263278
11.9%
2197470
 
9.0%
6103513
 
4.7%
U100179
 
4.5%
K100179
 
4.5%
O100179
 
4.5%
W100179
 
4.5%
Other values (8)131089
 
5.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1190547
54.0%
Uppercase Letter701253
31.8%
Dash Punctuation282048
 
12.8%
Math Symbol31177
 
1.4%

Most frequent character per category

ValueCountFrequency (%)
4526374
44.2%
5263278
22.1%
2197470
 
16.6%
6103513
 
8.7%
149942
 
4.2%
849942
 
4.2%
915
 
< 0.1%
37
 
< 0.1%
04
 
< 0.1%
72
 
< 0.1%
ValueCountFrequency (%)
N300537
42.9%
U100179
 
14.3%
K100179
 
14.3%
O100179
 
14.3%
W100179
 
14.3%
ValueCountFrequency (%)
+18941
60.8%
<12236
39.2%
ValueCountFrequency (%)
-282048
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1503772
68.2%
Latin701253
31.8%

Most frequent character per script

ValueCountFrequency (%)
4526374
35.0%
-282048
18.8%
5263278
17.5%
2197470
 
13.1%
6103513
 
6.9%
149942
 
3.3%
849942
 
3.3%
+18941
 
1.3%
<12236
 
0.8%
915
 
< 0.1%
Other values (3)13
 
< 0.1%
ValueCountFrequency (%)
N300537
42.9%
U100179
 
14.3%
K100179
 
14.3%
O100179
 
14.3%
W100179
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2205025
100.0%

Most frequent character per block

ValueCountFrequency (%)
4526374
23.9%
N300537
13.6%
-282048
12.8%
5263278
11.9%
2197470
 
9.0%
6103513
 
4.7%
U100179
 
4.5%
K100179
 
4.5%
O100179
 
4.5%
W100179
 
4.5%
Other values (8)131089
 
5.9%

VIC_RACE
Categorical

Distinct7
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size3.2 MiB
UNKNOWN
109813 
BLACK
109704 
WHITE HISPANIC
76211 
WHITE
66066 
ASIAN / PACIFIC ISLANDER
32312 
Other values (2)
19305 

Length

Max length30
Median length7
Mean length9.147073977
Min length5

Characters and Unicode

Total characters3781501
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBLACK
2nd rowBLACK
3rd rowBLACK
4th rowBLACK
5th rowBLACK HISPANIC
ValueCountFrequency (%)
UNKNOWN109813
26.6%
BLACK109704
26.5%
WHITE HISPANIC76211
18.4%
WHITE66066
16.0%
ASIAN / PACIFIC ISLANDER32312
 
7.8%
BLACK HISPANIC17977
 
4.3%
AMERICAN INDIAN/ALASKAN NATIVE1328
 
0.3%
(Missing)1
 
< 0.1%
2021-03-06T16:39:56.048680image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-06T16:39:56.309981image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
white142277
23.4%
black127681
21.0%
unknown109813
18.1%
hispanic94188
15.5%
pacific32312
 
5.3%
islander32312
 
5.3%
asian32312
 
5.3%
32312
 
5.3%
american1328
 
0.2%
native1328
 
0.2%

Most occurring characters

ValueCountFrequency (%)
N494891
13.1%
I465213
12.3%
A360413
 
9.5%
C287821
 
7.6%
W252090
 
6.7%
K238822
 
6.3%
H236465
 
6.3%
193780
 
5.1%
E177245
 
4.7%
L161321
 
4.3%
Other values (12)913440
24.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3554081
94.0%
Space Separator193780
 
5.1%
Other Punctuation33640
 
0.9%

Most frequent character per category

ValueCountFrequency (%)
N494891
13.9%
I465213
13.1%
A360413
10.1%
C287821
 
8.1%
W252090
 
7.1%
K238822
 
6.7%
H236465
 
6.7%
E177245
 
5.0%
L161321
 
4.5%
S160140
 
4.5%
Other values (10)719660
20.2%
ValueCountFrequency (%)
193780
100.0%
ValueCountFrequency (%)
/33640
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3554081
94.0%
Common227420
 
6.0%

Most frequent character per script

ValueCountFrequency (%)
N494891
13.9%
I465213
13.1%
A360413
10.1%
C287821
 
8.1%
W252090
 
7.1%
K238822
 
6.7%
H236465
 
6.7%
E177245
 
5.0%
L161321
 
4.5%
S160140
 
4.5%
Other values (10)719660
20.2%
ValueCountFrequency (%)
193780
85.2%
/33640
 
14.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII3781501
100.0%

Most frequent character per block

ValueCountFrequency (%)
N494891
13.1%
I465213
12.3%
A360413
 
9.5%
C287821
 
7.6%
W252090
 
6.7%
K238822
 
6.3%
H236465
 
6.3%
193780
 
5.1%
E177245
 
4.7%
L161321
 
4.3%
Other values (12)913440
24.2%

VIC_SEX
Categorical

Distinct4
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size3.2 MiB
F
164503 
M
152219 
D
63160 
E
33529 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters413411
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowM
3rd rowF
4th rowF
5th rowM
ValueCountFrequency (%)
F164503
39.8%
M152219
36.8%
D63160
 
15.3%
E33529
 
8.1%
(Missing)1
 
< 0.1%
2021-03-06T16:39:57.179973image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-03-06T16:39:57.398044image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
f164503
39.8%
m152219
36.8%
d63160
 
15.3%
e33529
 
8.1%

Most occurring characters

ValueCountFrequency (%)
F164503
39.8%
M152219
36.8%
D63160
 
15.3%
E33529
 
8.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter413411
100.0%

Most frequent character per category

ValueCountFrequency (%)
F164503
39.8%
M152219
36.8%
D63160
 
15.3%
E33529
 
8.1%

Most occurring scripts

ValueCountFrequency (%)
Latin413411
100.0%

Most frequent character per script

ValueCountFrequency (%)
F164503
39.8%
M152219
36.8%
D63160
 
15.3%
E33529
 
8.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII413411
100.0%

Most frequent character per block

ValueCountFrequency (%)
F164503
39.8%
M152219
36.8%
D63160
 
15.3%
E33529
 
8.1%

X_COORD_CD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct47204
Distinct (%)11.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1005765.402
Minimum913411
Maximum1067185
Zeros0
Zeros (%)0.0%
Memory size3.2 MiB
2021-03-06T16:39:58.230345image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum913411
5-th percentile979467
Q1993253
median1005026
Q31017356
95-th percentile1043751.35
Maximum1067185
Range153774
Interquartile range (IQR)24103

Descriptive statistics

Standard deviation21261.6037
Coefficient of variation (CV)0.02113972468
Kurtosis1.507604224
Mean1005765.402
Median Absolute Deviation (MAD)12093
Skewness-0.2529144364
Sum4.157954865 × 1011
Variance452055791.7
MonotocityNot monotonic
2021-03-06T16:39:58.625310image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
987220939
 
0.2%
989211537
 
0.1%
1019840412
 
0.1%
991655405
 
0.1%
997880399
 
0.1%
1005725386
 
0.1%
1006537326
 
0.1%
1004138310
 
0.1%
1017141304
 
0.1%
1020754302
 
0.1%
Other values (47194)409092
99.0%
ValueCountFrequency (%)
9134111
< 0.1%
9135122
< 0.1%
9137841
< 0.1%
9138191
< 0.1%
9138531
< 0.1%
ValueCountFrequency (%)
106718510
< 0.1%
10671171
 
< 0.1%
10670831
 
< 0.1%
10670534
 
< 0.1%
10670001
 
< 0.1%

Y_COORD_CD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct50133
Distinct (%)12.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean207757.3739
Minimum121131
Maximum271820
Zeros0
Zeros (%)0.0%
Memory size3.2 MiB
2021-03-06T16:39:59.153031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum121131
5-th percentile157683
Q1185117.5
median206689
Q3235517
95-th percentile255040
Maximum271820
Range150689
Interquartile range (IQR)50399.5

Descriptive statistics

Standard deviation30289.58164
Coefficient of variation (CV)0.1457930521
Kurtosis-0.8934401953
Mean207757.3739
Median Absolute Deviation (MAD)24288
Skewness-0.03340722637
Sum8.588939146 × 1010
Variance917458755.7
MonotocityNot monotonic
2021-03-06T16:39:59.504220image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
212676944
 
0.2%
222871527
 
0.1%
206689411
 
0.1%
213390410
 
0.1%
192557399
 
0.1%
249742378
 
0.1%
244511328
 
0.1%
183798313
 
0.1%
209365304
 
0.1%
215043301
 
0.1%
Other values (50123)409097
99.0%
ValueCountFrequency (%)
1211312
< 0.1%
1215082
< 0.1%
1216114
< 0.1%
1216744
< 0.1%
1217361
 
< 0.1%
ValueCountFrequency (%)
27182011
< 0.1%
2717301
 
< 0.1%
2715511
 
< 0.1%
2714241
 
< 0.1%
2713047
< 0.1%

Latitude
Real number (ℝ≥0)

HIGH CORRELATION

Distinct67399
Distinct (%)16.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.73687867
Minimum40.49890536
Maximum40.9127234
Zeros0
Zeros (%)0.0%
Memory size3.2 MiB
2021-03-06T16:39:59.968175image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum40.49890536
5-th percentile40.59944719
Q140.67474383
median40.73394232
Q340.81310735
95-th percentile40.86664779
Maximum40.9127234
Range0.413818033
Interquartile range (IQR)0.138363516

Descriptive statistics

Standard deviation0.08314194231
Coefficient of variation (CV)0.00204095024
Kurtosis-0.8934189037
Mean40.73687867
Median Absolute Deviation (MAD)0.0666757345
Skewness-0.03363583397
Sum16841114.48
Variance0.006912582571
MonotocityNot monotonic
2021-03-06T16:40:00.310598image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40.75043077937
 
0.2%
40.77841252526
 
0.1%
40.73392684411
 
0.1%
40.75238792405
 
0.1%
40.69519898396
 
0.1%
40.85214119378
 
0.1%
40.83778162317
 
0.1%
40.67110691304
 
0.1%
40.74134137301
 
0.1%
40.6517009300
 
0.1%
Other values (67389)409137
99.0%
ValueCountFrequency (%)
40.498905362
< 0.1%
40.499947542
< 0.1%
40.500215984
< 0.1%
40.500390774
< 0.1%
40.500562791
 
< 0.1%
ValueCountFrequency (%)
40.912723411
< 0.1%
40.912476431
 
< 0.1%
40.911982111
 
< 0.1%
40.911634331
 
< 0.1%
40.911307467
< 0.1%

Longitude
Real number (ℝ)

HIGH CORRELATION

Distinct67400
Distinct (%)16.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-73.9223374
Minimum-74.25474319
Maximum-73.70072029
Zeros0
Zeros (%)0.0%
Memory size3.2 MiB
2021-03-06T16:40:00.730663image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum-74.25474319
5-th percentile-74.01723611
Q1-73.96750176
median-73.92500172
Q3-73.88039168
95-th percentile-73.7854175
Maximum-73.70072029
Range0.554022895
Interquartile range (IQR)0.087110078

Descriptive statistics

Standard deviation0.07667732122
Coefficient of variation (CV)-0.001037268624
Kurtosis1.495141859
Mean-73.9223374
Median Absolute Deviation (MAD)0.043568766
Skewness-0.2519878689
Sum-30560381.35
Variance0.005879411589
MonotocityNot monotonic
2021-03-06T16:40:01.061369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-73.98928218937
 
0.2%
-73.98208877526
 
0.1%
-73.8715824411
 
0.1%
-73.97327466405
 
0.1%
-73.95084903396
 
0.1%
-73.92237572378
 
0.1%
-73.91945797317
 
0.1%
-73.88143296304
 
0.1%
-73.97839261301
 
0.1%
-73.86844675300
 
0.1%
Other values (67390)409137
99.0%
ValueCountFrequency (%)
-74.254743191
< 0.1%
-74.2543772
< 0.1%
-74.253403031
< 0.1%
-74.253257461
< 0.1%
-74.253148271
< 0.1%
ValueCountFrequency (%)
-73.7007202910
< 0.1%
-73.70095661
 
< 0.1%
-73.701074421
 
< 0.1%
-73.70117854
 
< 0.1%
-73.701386381
 
< 0.1%

Lat_Lon
Categorical

HIGH CARDINALITY

Distinct67403
Distinct (%)16.3%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
(40.75043076800005, -73.98928217599996)
 
937
(40.77841252300004, -73.98208876999998)
 
526
(40.73392684100002, -73.87158239799999)
 
411
(40.75238791700008, -73.97327466399997)
 
405
(40.69519897600002, -73.95084903199995)
 
396
Other values (67398)
410737 

Length

Max length40
Median length39
Mean length39.03702118
Min length31

Characters and Unicode

Total characters16138373
Distinct characters16
Distinct categories6 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20593 ?
Unique (%)5.0%

Sample

1st row(40.62576896100006, -73.99141682199996)
2nd row(40.67458330800008, -73.93022154099998)
3rd row(40.82310129900002, -73.86969046099993)
4th row(40.88745131300004, -73.84760778699997)
5th row(40.80022202900005, -73.93084834199995)
ValueCountFrequency (%)
(40.75043076800005, -73.98928217599996)937
 
0.2%
(40.77841252300004, -73.98208876999998)526
 
0.1%
(40.73392684100002, -73.87158239799999)411
 
0.1%
(40.75238791700008, -73.97327466399997)405
 
0.1%
(40.69519897600002, -73.95084903199995)396
 
0.1%
(40.85214118700002, -73.92237572199997)378
 
0.1%
(40.83778161800007, -73.91945797099999)317
 
0.1%
(40.67110691100004, -73.88143295699997)304
 
0.1%
(40.74134137300007, -73.97839260899997)301
 
0.1%
(40.65170090400005, -73.86844675099996)300
 
0.1%
Other values (67393)409137
99.0%
2021-03-06T16:40:02.175321image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
40.75043076800005937
 
0.1%
73.98928217599996937
 
0.1%
73.98208876999998526
 
0.1%
40.77841252300004526
 
0.1%
73.87158239799999411
 
< 0.1%
40.73392684100002411
 
< 0.1%
40.75238791700008405
 
< 0.1%
73.97327466399997405
 
< 0.1%
73.95084903199995396
 
< 0.1%
40.69519897600002396
 
< 0.1%
Other values (134789)821474
99.4%

Most occurring characters

ValueCountFrequency (%)
02757116
17.1%
92491983
15.4%
71407590
8.7%
41286633
8.0%
31140342
7.1%
81014050
 
6.3%
6935385
 
5.8%
5857928
 
5.3%
.826824
 
5.1%
2685119
 
4.2%
Other values (6)2735403
16.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number13244489
82.1%
Other Punctuation1240236
 
7.7%
Open Punctuation413412
 
2.6%
Space Separator413412
 
2.6%
Dash Punctuation413412
 
2.6%
Close Punctuation413412
 
2.6%

Most frequent character per category

ValueCountFrequency (%)
02757116
20.8%
92491983
18.8%
71407590
10.6%
41286633
9.7%
31140342
8.6%
81014050
 
7.7%
6935385
 
7.1%
5857928
 
6.5%
2685119
 
5.2%
1668343
 
5.0%
ValueCountFrequency (%)
.826824
66.7%
,413412
33.3%
ValueCountFrequency (%)
(413412
100.0%
ValueCountFrequency (%)
413412
100.0%
ValueCountFrequency (%)
-413412
100.0%
ValueCountFrequency (%)
)413412
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common16138373
100.0%

Most frequent character per script

ValueCountFrequency (%)
02757116
17.1%
92491983
15.4%
71407590
8.7%
41286633
8.0%
31140342
7.1%
81014050
 
6.3%
6935385
 
5.8%
5857928
 
5.3%
.826824
 
5.1%
2685119
 
4.2%
Other values (6)2735403
16.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII16138373
100.0%

Most frequent character per block

ValueCountFrequency (%)
02757116
17.1%
92491983
15.4%
71407590
8.7%
41286633
8.0%
31140342
7.1%
81014050
 
6.3%
6935385
 
5.8%
5857928
 
5.3%
.826824
 
5.1%
2685119
 
4.2%
Other values (6)2735403
16.9%

New Georeferenced Column
Categorical

HIGH CARDINALITY

Distinct67403
Distinct (%)16.3%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
POINT (-73.98928217599996 40.75043076800005)
 
937
POINT (-73.98208876999998 40.77841252300004)
 
526
POINT (-73.87158239799999 40.73392684100002)
 
411
POINT (-73.97327466399997 40.75238791700008)
 
405
POINT (-73.95084903199995 40.69519897600002)
 
396
Other values (67398)
410737 

Length

Max length45
Median length44
Mean length44.03702118
Min length36

Characters and Unicode

Total characters18205433
Distinct characters20
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20593 ?
Unique (%)5.0%

Sample

1st rowPOINT (-73.99141682199996 40.62576896100006)
2nd rowPOINT (-73.93022154099998 40.67458330800008)
3rd rowPOINT (-73.86969046099993 40.82310129900002)
4th rowPOINT (-73.84760778699997 40.88745131300004)
5th rowPOINT (-73.93084834199995 40.80022202900005)
ValueCountFrequency (%)
POINT (-73.98928217599996 40.75043076800005)937
 
0.2%
POINT (-73.98208876999998 40.77841252300004)526
 
0.1%
POINT (-73.87158239799999 40.73392684100002)411
 
0.1%
POINT (-73.97327466399997 40.75238791700008)405
 
0.1%
POINT (-73.95084903199995 40.69519897600002)396
 
0.1%
POINT (-73.92237572199997 40.85214118700002)378
 
0.1%
POINT (-73.91945797099999 40.83778161800007)317
 
0.1%
POINT (-73.88143295699997 40.67110691100004)304
 
0.1%
POINT (-73.97839260899997 40.74134137300007)301
 
0.1%
POINT (-73.86844675099996 40.65170090400005)300
 
0.1%
Other values (67393)409137
99.0%
2021-03-06T16:40:03.250379image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
point413412
33.3%
40.75043076800005937
 
0.1%
73.98928217599996937
 
0.1%
73.98208876999998526
 
< 0.1%
40.77841252300004526
 
< 0.1%
40.73392684100002411
 
< 0.1%
73.87158239799999411
 
< 0.1%
73.97327466399997405
 
< 0.1%
40.75238791700008405
 
< 0.1%
73.95084903199995396
 
< 0.1%
Other values (134790)821870
66.3%

Most occurring characters

ValueCountFrequency (%)
02757116
15.1%
92491983
13.7%
71407590
 
7.7%
41286633
 
7.1%
31140342
 
6.3%
81014050
 
5.6%
6935385
 
5.1%
5857928
 
4.7%
826824
 
4.5%
.826824
 
4.5%
Other values (10)4660758
25.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number13244489
72.8%
Uppercase Letter2067060
 
11.4%
Space Separator826824
 
4.5%
Other Punctuation826824
 
4.5%
Open Punctuation413412
 
2.3%
Dash Punctuation413412
 
2.3%
Close Punctuation413412
 
2.3%

Most frequent character per category

ValueCountFrequency (%)
02757116
20.8%
92491983
18.8%
71407590
10.6%
41286633
9.7%
31140342
8.6%
81014050
 
7.7%
6935385
 
7.1%
5857928
 
6.5%
2685119
 
5.2%
1668343
 
5.0%
ValueCountFrequency (%)
P413412
20.0%
O413412
20.0%
I413412
20.0%
N413412
20.0%
T413412
20.0%
ValueCountFrequency (%)
826824
100.0%
ValueCountFrequency (%)
(413412
100.0%
ValueCountFrequency (%)
-413412
100.0%
ValueCountFrequency (%)
.826824
100.0%
ValueCountFrequency (%)
)413412
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common16138373
88.6%
Latin2067060
 
11.4%

Most frequent character per script

ValueCountFrequency (%)
02757116
17.1%
92491983
15.4%
71407590
8.7%
41286633
8.0%
31140342
7.1%
81014050
 
6.3%
6935385
 
5.8%
5857928
 
5.3%
826824
 
5.1%
.826824
 
5.1%
Other values (5)2593698
16.1%
ValueCountFrequency (%)
P413412
20.0%
O413412
20.0%
I413412
20.0%
N413412
20.0%
T413412
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII18205433
100.0%

Most frequent character per block

ValueCountFrequency (%)
02757116
15.1%
92491983
13.7%
71407590
 
7.7%
41286633
 
7.1%
31140342
 
6.3%
81014050
 
5.6%
6935385
 
5.1%
5857928
 
4.7%
826824
 
4.5%
.826824
 
4.5%
Other values (10)4660758
25.6%

Interactions

2021-03-06T16:38:13.561646image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:14.022210image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:14.575281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:14.864048image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:15.148744image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:15.302680image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:15.632283image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:15.936977image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:16.249435image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:16.530492image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:16.909758image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:17.092359image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:17.375728image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:17.695426image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:17.992113image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:18.152375image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:18.454587image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:18.767426image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:19.073492image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:19.367167image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:19.531862image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:19.713873image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:19.890843image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:20.057471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:20.228014image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:20.378612image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:20.551295image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:20.761772image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:20.934574image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:21.113096image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:21.391353image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:21.676591image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:21.856033image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:22.658775image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:22.932968image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:23.077168image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:23.361130image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:23.658332image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:24.004409image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:24.602066image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:25.182975image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:25.807306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:26.179749image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:26.859929image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:27.621890image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:28.000876image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:28.686044image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:29.384215image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:30.040588image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:30.646250image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:31.285274image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:31.874706image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:32.232744image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:32.795245image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:33.382667image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:33.695833image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:34.393424image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:35.001394image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:35.654645image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:36.216818image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:36.601827image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:36.941015image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:37.282102image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:37.551922image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:37.925882image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:38.289945image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:38.692836image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:39.076845image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:39.418928image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:40.014915image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:40.644236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:41.262002image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:41.656391image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:42.354874image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:43.018690image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:43.676118image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:44.063081image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:44.744223image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:45.434403image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:46.118548image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:46.776824image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:47.392179image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:47.822029image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:48.444918image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:49.150068image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:49.827564image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:50.195954image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:51.048957image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:51.743342image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:52.384921image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:53.057375image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:53.647979image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:54.003241image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:54.620836image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:55.212390image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:55.826718image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:56.125406image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:56.708984image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:57.026783image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:57.316009image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:57.611163image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:57.897955image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:58.068699image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:58.340268image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:58.640255image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:58.953756image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:59.115350image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:59.406039image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:38:59.720347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-03-06T16:39:00.019459image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-03-06T16:40:03.638950image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-03-06T16:40:04.259160image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-03-06T16:40:04.868449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-03-06T16:40:05.566779image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-03-06T16:40:06.564002image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-03-06T16:39:03.041482image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-03-06T16:39:12.192108image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-03-06T16:39:28.814531image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-03-06T16:39:32.759513image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

CMPLNT_NUMADDR_PCT_CDBORO_NMCMPLNT_FR_DTCMPLNT_FR_TMCMPLNT_TO_DTCMPLNT_TO_TMCRM_ATPT_CPTD_CDHADEVELOPTHOUSING_PSAJURISDICTION_CODEJURIS_DESCKY_CDLAW_CAT_CDLOC_OF_OCCUR_DESCOFNS_DESCPARKS_NMPATROL_BOROPD_CDPD_DESCPREM_TYP_DESCRPT_DTSTATION_NAMESUSP_AGE_GROUPSUSP_RACESUSP_SEXTRANSIT_DISTRICTVIC_AGE_GROUPVIC_RACEVIC_SEXX_COORD_CDY_COORD_CDLatitudeLongitudeLat_LonNew Georeferenced Column
088577678866NaN12/23/202019:50:00NaNNaNCOMPLETEDNaNNaNNaNN.Y. POLICE DEPT101FELONYOUTSIDEMURDER & NON-NEGL. MANSLAUGHTERNaNNaNNaNNaNNaN12/23/2020NaNNaNNaNNaNNaN18-24BLACKM98663316725840.625769-73.991417(40.62576896100006, -73.99141682199996)POINT (-73.99141682199996 40.62576896100006)
135063719577NaN12/21/202001:10:00NaNNaNCOMPLETEDNaNNaNNaNN.Y. POLICE DEPT101FELONYINSIDEMURDER & NON-NEGL. MANSLAUGHTERNaNNaNNaNNaNNaN12/21/2020NaNNaNNaNNaNNaN25-44BLACKM100360618505040.674583-73.930222(40.67458330800008, -73.93022154099998)POINT (-73.93022154099998 40.67458330800008)
234784316843BRONX11/22/202022:00:00NaNNaNCOMPLETEDNaNNaN0.0N.Y. POLICE DEPT104FELONYNaNRAPENaNPATROL BORO BRONX157.0RAPE 1STREET11/23/2020NaNUNKNOWNUNKNOWNUNaN25-44BLACKF102031623917940.823101-73.869690(40.82310129900002, -73.86969046099993)POINT (-73.86969046099993 40.82310129900002)
319794139647NaN11/22/202009:50:00NaNNaNCOMPLETEDNaNNaNNaNN.Y. POLICE DEPT101FELONYINSIDEMURDER & NON-NEGL. MANSLAUGHTERNaNNaNNaNNaNNaN11/22/2020NaN25-44BLACKMNaN25-44BLACKF102638726263440.887451-73.847608(40.88745131300004, -73.84760778699997)POINT (-73.84760778699997 40.88745131300004)
429840492725NaN11/21/202015:38:00NaNNaNCOMPLETEDNaNNaNNaNN.Y. HOUSING POLICE101FELONYOUTSIDEMURDER & NON-NEGL. MANSLAUGHTERNaNNaNNaNNaNNaN11/21/2020NaNNaNNaNNaNNaN18-24BLACK HISPANICM100339623082440.800222-73.930848(40.80022202900005, -73.93084834199995)POINT (-73.93084834199995 40.80022202900005)
554934289044NaN11/05/202009:40:00NaNNaNCOMPLETEDNaNNaNNaNN.Y. POLICE DEPT101FELONYOUTSIDEMURDER & NON-NEGL. MANSLAUGHTERNaNNaNNaNNaNNaN11/05/2020NaNNaNNaNNaNNaN18-24WHITE HISPANICM100643424434440.837324-73.919831(40.83732351100008, -73.91983075699994)POINT (-73.91983075699994 40.83732351100008)
692135141028NaN11/04/202009:14:00NaNNaNCOMPLETEDNaNNaNNaNN.Y. POLICE DEPT101FELONYOUTSIDEMURDER & NON-NEGL. MANSLAUGHTERNaNNaNNaNNaNNaN11/04/2020NaN25-44BLACKMNaN25-44BLACKM99767023054540.799467-73.951531(40.799466801000044, -73.95153053599995)POINT (-73.95153053599995 40.799466801000044)
745235023544NaN11/02/202018:30:00NaNNaNCOMPLETEDNaNNaNNaNN.Y. POLICE DEPT101FELONYINSIDEMURDER & NON-NEGL. MANSLAUGHTERNaNNaNNaNNaNNaN11/02/2020NaN<18WHITE HISPANICMNaN18-24BLACKF100699924589740.841585-73.917784(40.841584606000026, -73.91778363799993)POINT (-73.91778363799993 40.841584606000026)
8714801710110NaN11/01/202001:20:00NaNNaNCOMPLETEDNaNNaNNaNN.Y. POLICE DEPT101FELONYOUTSIDEMURDER & NON-NEGL. MANSLAUGHTERNaNNaNNaNNaNNaN11/01/2020NaNNaNNaNNaNNaN25-44WHITE HISPANICM101657321004540.743151-73.883355(40.74315076400006, -73.88335454299995)POINT (-73.88335454299995 40.74315076400006)
998595601763NaN10/31/202001:50:00NaNNaNCOMPLETEDNaNNaNNaNN.Y. POLICE DEPT101FELONYOUTSIDEMURDER & NON-NEGL. MANSLAUGHTERNaNNaNNaNNaNNaN10/31/2020NaNNaNNaNNaNNaN25-44BLACKM100507516992640.633068-73.924972(40.63306790400002, -73.92497238099996)POINT (-73.92497238099996 40.63306790400002)

Last rows

CMPLNT_NUMADDR_PCT_CDBORO_NMCMPLNT_FR_DTCMPLNT_FR_TMCMPLNT_TO_DTCMPLNT_TO_TMCRM_ATPT_CPTD_CDHADEVELOPTHOUSING_PSAJURISDICTION_CODEJURIS_DESCKY_CDLAW_CAT_CDLOC_OF_OCCUR_DESCOFNS_DESCPARKS_NMPATROL_BOROPD_CDPD_DESCPREM_TYP_DESCRPT_DTSTATION_NAMESUSP_AGE_GROUPSUSP_RACESUSP_SEXTRANSIT_DISTRICTVIC_AGE_GROUPVIC_RACEVIC_SEXX_COORD_CDY_COORD_CDLatitudeLongitudeLat_LonNew Georeferenced Column
41340229631064770BROOKLYN01/02/202018:00:0001/02/202018:10:00COMPLETEDNaNNaN0.0N.Y. POLICE DEPT351MISDEMEANORINSIDECRIMINAL MISCHIEF & RELATED OFNaNPATROL BORO BKLYN SOUTH259.0CRIMINAL MISCHIEF,UNCLASSIFIED 4RESIDENCE - APT. HOUSE01/02/2020NaNUNKNOWNBLACK HISPANICUNaN45-64BLACKF99619317657140.651323-73.956961(40.65132343000005, -73.95696100099997)POINT (-73.95696100099997 40.65132343000005)
41340317761236530MANHATTAN01/01/202000:30:0001/01/202001:00:00COMPLETEDNaNNaN0.0N.Y. POLICE DEPT106FELONYNaNFELONY ASSAULTNaNPATROL BORO MAN NORTH109.0ASSAULT 2,1,UNCLASSIFIEDSTREET01/04/2020NaN25-44BLACKMNaN25-44BLACKM99770123990040.825144-73.951400(40.825143625000074, -73.95139982299997)POINT (-73.95139982299997 40.825143625000074)
413404710352058106QUEENS12/16/201909:00:0012/17/201917:00:00ATTEMPTEDNaNNaN0.0N.Y. POLICE DEPT109FELONYINSIDEGRAND LARCENYNaNPATROL BORO QUEENS SOUTH430.0LARCENY,GRAND BY BANK ACCT COMPROMISE-UNCLASSIFIEDRESIDENCE-HOUSE01/03/2020NaNNaNNaNNaNNaN45-64WHITEM102707218580440.676570-73.845620(40.67657045900006, -73.84562010099995)POINT (-73.84562010099995 40.67657045900006)
41340571816490749BRONX01/05/202011:45:0001/05/202011:57:00COMPLETEDNaNNaN0.0N.Y. POLICE DEPT351MISDEMEANORINSIDECRIMINAL MISCHIEF & RELATED OFNaNPATROL BORO BRONX259.0CRIMINAL MISCHIEF,UNCLASSIFIED 4RESIDENCE-HOUSE01/05/2020NaN25-44WHITEMNaN25-44WHITE HISPANICF102337924935840.851027-73.858564(40.85102663600002, -73.85856409399997)POINT (-73.85856409399997 40.85102663600002)
41340633979048914MANHATTAN01/04/202021:43:0001/04/202021:44:00COMPLETEDNaNNaN0.0N.Y. POLICE DEPT341MISDEMEANORINSIDEPETIT LARCENYNaNPATROL BORO MAN SOUTH333.0LARCENY,PETIT FROM STORE-SHOPLCLOTHING/BOUTIQUE01/05/2020NaNUNKNOWNBLACKMNaNUNKNOWNUNKNOWND98787321231540.749440-73.986926(40.74943967000007, -73.98692557399994)POINT (-73.98692557399994 40.74943967000007)
41340794749080813MANHATTAN01/04/202018:25:0001/04/202018:28:00COMPLETEDNaNNaN0.0N.Y. POLICE DEPT341MISDEMEANORINSIDEPETIT LARCENYNaNPATROL BORO MAN SOUTH333.0LARCENY,PETIT FROM STORE-SHOPLDEPARTMENT STORE01/04/2020NaNUNKNOWNWHITE HISPANICMNaNUNKNOWNUNKNOWND99023820936540.741341-73.978393(40.74134137300007, -73.97839260899997)POINT (-73.97839260899997 40.74134137300007)
413408913801459102QUEENS01/02/202020:30:0001/03/202006:30:00COMPLETEDNaNNaN0.0N.Y. POLICE DEPT121FELONYFRONT OFCRIMINAL MISCHIEF & RELATED OFNaNPATROL BORO QUEENS SOUTH269.0MISCHIEF,CRIMINAL, UNCL 2NDRESIDENCE-HOUSE01/03/2020NaNUNKNOWNUNKNOWNUNaN45-64ASIAN / PACIFIC ISLANDERM103240419023940.688716-73.826366(40.68871610400004, -73.82636559499997)POINT (-73.82636559499997 40.68871610400004)
41340992701328324MANHATTAN01/02/202009:32:0001/02/202009:36:00COMPLETEDNaNNaN0.0N.Y. POLICE DEPT107FELONYINSIDEBURGLARYNaNPATROL BORO MAN NORTH211.0BURGLARY,COMMERCIAL,DAYCHAIN STORE01/02/2020NaN25-44BLACKMNaNUNKNOWNUNKNOWND99107522707440.789947-73.975354(40.78994739900003, -73.97535415699997)POINT (-73.97535415699997 40.78994739900003)
41341084407373550BRONX01/05/202012:55:0001/05/202013:07:00COMPLETEDNaNNaN0.0N.Y. POLICE DEPT347MISDEMEANORNaNINTOXICATED & IMPAIRED DRIVINGNaNPATROL BORO BRONX905.0INTOXICATED DRIVING,ALCOHOLSTREET01/05/2020NaN45-64WHITE HISPANICMNaNUNKNOWNUNKNOWNE101421126075340.882338-73.891652(40.88233829700005, -73.89165215599996)POINT (-73.89165215599996 40.88233829700005)
413411871721952115QUEENS01/01/202005:20:0001/01/202005:25:00COMPLETEDNaNNaN0.0N.Y. POLICE DEPT106FELONYINSIDEFELONY ASSAULTNaNPATROL BORO QUEENS NORTH109.0ASSAULT 2,1,UNCLASSIFIEDRESIDENCE-HOUSE01/01/2020NaN25-44WHITE HISPANICFNaN25-44WHITE HISPANICM101920121238940.749574-73.873858(40.74957445600006, -73.87385847499998)POINT (-73.87385847499998 40.74957445600006)